2017-03-16 145 views
0

我有一個矩陣X,我試圖用KNN和皮爾遜相關性度量。是否有可能使用皮爾遜相關性作爲sklearn度量標準?我已經試過這樣的事情:在sklearn中可以使用皮爾遜相關度量嗎?

def pearson_calc(M): 
    P = (1 - np.array([[pearsonr(a,b)[0] for a in M] for b in M])) 
    return P 

nbrs = NearestNeighbors(n_neighbors=4, metric=pearson_calc) 
nbrs.fit(X) 
knbrs = nbrs.kneighbors(X) 

然而,當我碰到下面的錯誤,這並不工作:

pearson_affinity() takes 1 positional argument but 2 were given

我假設pearson_calc功能是錯誤的。也許它需要一個a,b參數而不是矩陣。

回答

1

這裏是關於此事的文檔:

If metric is a callable function, it is called on each pair of instances (rows) and the resulting value recorded. The callable should take two arrays as input and return one value indicating the distance between them.

此外,公制有效值爲:

from scikit-learn:

[‘cityblock’, ‘cosine’, ‘euclidean’, ‘l1’, ‘l2’,‘manhattan’]

from scipy.spatial.distance:

[‘braycurtis’, ‘canberra’, ‘chebyshev’, ‘correlation’, ‘dice’, ‘hamming’, ‘jaccard’, ‘kulsinski’, ‘mahalanobis’, ‘matching’, ‘minkowski’, ‘rogerstanimoto’, ‘russellrao’, ‘seuclidean’, ‘sokalmichener’, ‘sokalsneath’, ‘sqeuclidean’, ‘yule’]

兩件事情:

  • 你的函數需要採取兩種參數(要計算度量(距離)的兩行),這就解釋了爲什麼這些錯誤表示爲t有兩個論點被傳遞給它。

  • 您可以使用scipy.spatial.distance.correlation作爲指標。

    from scipy.spatial.distance import correlation 
    nbrs = NearestNeighbors(n_neighbors=4, metric='correlation') 
    

    ` 源:sklearn NearestNeighbors