K-手段與我自己的距離函數的聚類

我已經定義了一個距離函數如下K-手段與我自己的距離函數的聚類

jaccard.rules.dist <- function(x,y) ({ 
    # implements feature distance. Feature "Airline" gets a different treatment, the rest 
    # are booleans coded as 1/0. Airline column distance = 0 if same airline, 1 otherwise 
    # the rest of the atributes' distance is cero iff both are 1, 1 otherwise 
    airline.column <- which(colnames(x)=="Aerolinea") 
    xmod <- x 
    ymod <-y 
    xmod[airline.column] <-ifelse(x[airline.column]==y[airline.column],1,0) 
    ymod[airline.column] <-1 # if they are the same, they are both ones, else they are different 

    andval <- sum(xmod&ymod) 
    orval <- sum(xmod|ymod) 
    return (1-andval/orval) 
})

這改變一點點的Jaccard距離爲形式的dataframes現在

t <- data.frame(Aerolinea=c("A","B","C","A"),atr2=c(1,1,0,0),atr3=c(0,0,0,1))

，我會喜歡用我剛剛定義的距離在我的數據集上執行一些k-均值聚類。如果我嘗試使用函數kmeans，則無法指定我的距離函數。我試過用hclust，它接受一個distanca矩陣，這是我計算如下

distmat <- matrix(nrow=nrow(t),ncol=nrow(t)) 
for (i in 1:nrow(t)) 
    for (j in i:nrow(t)) 
     distmat[j,i] <- jaccard.rules.dist(t[j,],t[i,]) 
distmat <- as.dist(distmat)

，然後調用hclust

hclust(distmat) 

Error in if (is.na(n) || n > 65536L) stop("size cannot be NA nor exceed 65536") : 
missing value where TRUE/FALSE needed

我到底做錯了什麼？有沒有另一種方法可以接受任意距離函數作爲其輸入？

在此先感謝。

來源

2013-05-06 user2345448

您的距離矩陣中是否缺少值？ – 2013-05-06 20:41:45

或者你的矩陣大小大於65536？ – 2013-05-06 21:04:23

不，沒有缺失的值，矩陣是（在上面的例子中）4x4 – user2345448 2013-05-06 23:39:27

我認爲distmat（從你的代碼）必須是距離結構（這是不同於矩陣）。試試這個：

require(proxy) 
d <- dist(t, jaccard.rules.dist) 
clust <- hclust(d=d) 
[email protected] 

    [,1]   [,2] 
[1,] 0.044128322 -0.039518142 
[2,] -0.986798495 0.975132418 
[3,] -0.006441892 0.001099211 
[4,] 1.487829642 1.000431146

來源

2013-05-06 21:06:31 Carson

我已經將我的矩陣轉換爲距離> distmat < - as.dist（distmat）。 – user2345448 2013-05-06 23:40:18

K-手段與我自己的距離函數的聚類

回答

相關問題