通過的Jaccard係數

-1

構造相似度矩陣譜聚類我有一個明確的數據集，我在其上執行頻譜聚類。但是我沒有得到很好的輸出。我選擇對應於最大特徵值的特徵向量作爲k均值的質心。通過的Jaccard係數

請查收過程中，我按照以下：

1. Create a symmetric similarity matrix (m*m) using jaccard coefficient. 
    For example, for a data set, 
    a,b,c,d 
    a,b,x,y 
    The similarity matrix I compute would look like : 
    |1  0.33| 
    |0.33  1 | 
2. Compute the first k eigen vectors corresponding to largest eigen values. where k is the number of cluster. 
3. Normalize the symmetric similarity matrix 
4. perform the clustering on the normalized similarity matrix using eigen vectors as initial centroids for k-means.

我的問題是：

Is computing Jaccard similarity matrix the right choice for spectral clustering. 

Is it the right way of selecting eigen vectors as cluster centroids for spectal clustering because I dont see other options for categorical dataset. 

Is there anything wrong with the procedure I follow.

來源

2015-06-10 Sam

據我所知，你已經混且改組的方法AA號碼。難怪它不工作...

，你可以簡單地使用傑卡德距離（Jaccard相似的簡單反轉）+系統聚類
你可以做MDS項目您的數據，然後K-均值（也許你正在嘗試做的）
親和力傳播等都是值得一試的

來源

2015-06-10 20:49:46

感謝您的回覆，我只是在聚類分析領域的初學者剛剛嘗試不同的方法。需要問另一件事。將在矩陣創建使用的Jaccard係數相似性矩陣（M * M），然後進行k-均值什麼好處。這是一種可行的方法嗎？我試圖使用它在http://archive.ics.uci.edu/ml/datasets.html，一些數據集（國會，蘑菇），它給了可喜的成果。由於 – Sam

k均值應該對原始數據進行運行。它意味着一個線性的歐幾里德向量空間。 **不要因爲你可以**而運行方法。理解算法*和*您的問題的要求和目標。如果你可以讓它們對齊（通常需要大量的預處理），那麼試試吧。 –

通過的Jaccard係數

回答

相關問題