我已經定義了一個距離函數如下K-手段與我自己的距離函數的聚類
jaccard.rules.dist <- function(x,y) ({
# implements feature distance. Feature "Airline" gets a different treatment, the rest
# are booleans coded as 1/0. Airline column distance = 0 if same airline, 1 otherwise
# the rest of the atributes' distance is cero iff both are 1, 1 otherwise
airline.column <- which(colnames(x)=="Aerolinea")
xmod <- x
ymod <-y
xmod[airline.column] <-ifelse(x[airline.column]==y[airline.column],1,0)
ymod[airline.column] <-1 # if they are the same, they are both ones, else they are different
andval <- sum(xmod&ymod)
orval <- sum(xmod|ymod)
return (1-andval/orval)
})
這改變一點點的Jaccard距離爲形式的dataframes現在
t <- data.frame(Aerolinea=c("A","B","C","A"),atr2=c(1,1,0,0),atr3=c(0,0,0,1))
,我會喜歡用我剛剛定義的距離在我的數據集上執行一些k-均值聚類。如果我嘗試使用函數kmeans,則無法指定我的距離函數。我試過用hclust,它接受一個distanca矩陣,這是我計算如下
distmat <- matrix(nrow=nrow(t),ncol=nrow(t))
for (i in 1:nrow(t))
for (j in i:nrow(t))
distmat[j,i] <- jaccard.rules.dist(t[j,],t[i,])
distmat <- as.dist(distmat)
,然後調用hclust
hclust(distmat)
Error in if (is.na(n) || n > 65536L) stop("size cannot be NA nor exceed 65536") :
missing value where TRUE/FALSE needed
我到底做錯了什麼?有沒有另一種方法可以接受任意距離函數作爲其輸入?
在此先感謝。
您的距離矩陣中是否缺少值? – 2013-05-06 20:41:45
或者你的矩陣大小大於65536? – 2013-05-06 21:04:23
不,沒有缺失的值,矩陣是(在上面的例子中)4x4 – user2345448 2013-05-06 23:39:27