如何聚類nmapy係數的熱圖

我想分層聚類2D numpy數組，使它看起來不錯，當我把它作爲相關矩陣在d3.js。如何聚類nmapy係數的熱圖

我的數據是這樣的：

[[ 1. 0.091 0.147 ..., -0.239 0.113 -0.012 ] 
[ 0.091 1. -0.153 ..., -0.004 -0.244 -0.00520801] 
[ 0.147 -0.153 1. ..., -0.157 0.013 0.133] 
..., 
[-0.239 -0.004 -0.157 ..., -0.265 -0.362 1. ]]

我計算這些-1到1。正如你可以看到之間的Pearson相關係數，有一個1對1的相關性下降對角線從左上角數組右下方。

如果我繪製這些值，我的相關矩陣是這樣的：

correlation matrix before clustering

集羣后，我希望它是有點類似於此，其中紅色代表陽性的相關性和藍色表示負相關：

heat

使用matplotlib和SciPy的，我可以羣集係數看起來像一個熱圖，但是，值發生改變。我希望我的價值觀保持不變。

I used this answer to graph the heatmap in python, but its not quite what I want since it changes my values.。我需要的只是將數據聚集並輸出到csv/json文件。

from scipy.spatial.distance import pdist, squareform 
from scipy.cluster.hierarchy import linkage, dendrogram 

data_dist = pdist(final_correlation, 'correlation') # If I use this, 
# it gives me an array that is half the size of my original correlation matrix. These are 
# the distances. How do I use this to re-order my correlation matrix as a clustered matrix? 


Out[1]: # The size is 9730, as opposed to the original size of 19,600 
[ 0.612 0.503 1.653 ..., 0.792 1.577 
0.829]

UPDATE 如果有人知道R，我試圖執行可能會看起來像代碼this

來源

2014-12-07 achabacha322

一個完整和最小的例子與虛擬數據將是一個很大的幫助 – YXD 2014-12-08 15:07:32

對不起，我不給一個完整的例子，但我找到了一種方法將數據集聚，儘管不如我想要的那麼好：

假設您有一個帶有相關性和標題行的csv文件。您可以複製的CSV文件的內容，並使用此代碼：

import scipy.cluster.hierarchy as hc 
import pandas 
from matplotlib import pyplot 

# copy the data to the clipboard first 
d = pandas.read_clipboard(sep=",", index_col=0) 
d.columns = [int(x) for x in d.columns] 

link = hc.linkage(d.values, method='centroid') 
o1 = hc.leaves_list(link) 

mat = d.iloc[o1,:] 
mat = mat.iloc[:, o1[::-1]] 
pyplot.imshow(mat)

這將導致這樣的事情： Imgur

在CSV的相關值包含重複的值，所以你必須扭轉數組的第二部分。

來源

2015-01-22 19:12:38 achabacha322

如何聚類nmapy係數的熱圖

回答

相關問題