Sklearn：找到簇的平均質心位置？

import pandas as pd, numpy as np, scipy 
import sklearn.feature_extraction.text as text 
from sklearn import decomposition 

descs = ["You should not go there", "We may go home later", "Why should we do your chores", "What should we do"] 

vectorizer = text.CountVectorizer() 

dtm = vectorizer.fit_transform(descs).toarray() 

vocab = np.array(vectorizer.get_feature_names()) 

nmf = decomposition.NMF(3, random_state = 1) 

topic = nmf.fit_transform(dtm)

印刷topic給我留下了：Sklearn：找到簇的平均質心位置？

>>> print(topic) 
[0.  , 1.403 , 0.  ], 
[0.  , 0.  , 1.637 ], 
[1.257 , 0.  , 0.  ], 
[0.874 , 0.056 , 0.065 ]

這是在descs的可能性每個元素的矢量屬於某個簇。我怎樣才能得到每個羣集質心的座標？最終，我想開發一個函數來計算descs中每個元素與其分配給的簇的質心之間的距離。

是否最好只計算每個羣集的每個元素值的平均值？topic？

來源

2016-07-27 blacksite

的docs的sklearn.decomposition.NMF解釋如何得到每個羣集的質心的座標：

屬性： components_：陣列，[n_components，n_features]
數據非負分量。

基向量排列逐行，如下面的交互式會話：

In [995]: np.set_printoptions(precision=2) 

In [996]: nmf.components_ 
Out[996]: 
array([[ 0.54, 0.91, 0. , 0. , 0. , 0. , 0. , 0.89, 0. , 0.89, 0.37, 0.54, 0. , 0.54], 
     [ 0. , 0.01, 0.71, 0. , 0. , 0. , 0.71, 0.72, 0.71, 0.01, 0.02, 0. , 0.71, 0. ], 
     [ 0. , 0.01, 0.61, 0.61, 0.61, 0.61, 0. , 0. , 0. , 0.62, 0.02, 0. , 0. , 0. ]])

關於你的第二個問題，我沒有看到的「計算的平均點每個descs元素的每個羣集的主題值「。在我看來，通過計算的可能性進行分類更有意義。

來源

2016-07-28 02:16:27 Tonechas

我假設你創建了三個質心。「nmf.components_」中每個元素表示的每個質心的座標如何？該數組中非零元素的數量似乎表示高維度。 – blacksite

nmf.components_'的尺寸是3行乘14列，它們對應於3個簇和14個不同的單詞，即表示簇質心的向量是詞彙基礎的線性組合。 – Tonechas

那麼我怎麼能找到質心本身的x-y座標？或者這是一個誤導的問題？ – blacksite

Sklearn：找到簇的平均質心位置？

回答

相關問題