PCA：獲得前20名最重要的尺寸

我正在做一些機器學習，並嘗試使用PCA查找重要維度。這是我到目前爲止已經完成：PCA：獲得前20名最重要的尺寸

from sklearn.decomposition import PCA 
pca = PCA(n_components=0.98) 
X_reduced = pca.fit_transform(df_normalized) 
X_reduced.shape 
(2208, 1961)

所以我有2208行由1961列運行PCA，說明在我的數據集的方差的98％。然而，我擔心具有最小解釋力的維度實際上可能會損害我對預測的嘗試（我的模型可能只是在數據中發現虛假的相關性）。

SciKit-Learn按重要性排序列嗎？如果是這樣，我可以這樣做：

X_final = X_reduced[:, :20]，對嗎？

感謝您的幫助！

來源

2017-07-06 bclayman

From the documentation它表示輸出按解釋的方差排序。所以是的，你應該能夠做你的建議，並且只需要輸出前N個維度。您還可以輸出輸出變量explained_variance_（或甚至explained_variance_ratio_）以及components_輸出，以仔細檢查訂單。從文檔

示例顯示瞭如何訪問說明差異金額：

import numpy as np 
from sklearn.decomposition import PCA 
X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]]) 
pca = PCA(n_components=2) 
pca.fit(X) 


print(pca.explained_variance_ratio_)

所以你的情況，你可以做print(X_reduced.components_)和print(X_reduced.explained_variance_ratio_)兩全。然後，在找到N解釋y方差的百分比後，簡單地從X_reduced.components_中取出您想要的第一個N.

請注意！在您建議的解決方案中，混合了維度。 X_reduced.components_的形狀是[n_components, n_features]因此，例如，如果你想要使用前20個組件，我相信我應該使用X_reduced.components[:20, :]。

來源

2017-07-06 17:37:52 Taako

PCA：獲得前20名最重要的尺寸

回答

相關問題