scikit-learn：獲取預測數據的選定特徵

我有一組訓練數據。用於創建模型的python腳本也將屬性計算爲一個numpy數組（這是一個位向量）。然後我想使用VarianceThreshold消除所有具有0個方差的特徵（例如全0或1）。然後運行get_support(indices=True)以獲取選擇列的索引。scikit-learn：獲取預測數據的選定特徵

我現在的問題是如何僅獲取我想要預測的數據的選定特徵。我首先計算所有的功能，然後使用數組索引，但它不起作用：

x_predict_all = getAllFeatures(suppl_predict) 
x_predict = x_predict_all[indices] #only selected features

indices是一個numpy數組。

返回的數組x_predict的長度正確len(x_predict)但錯誤的形狀x_predict.shape[1]仍然是原始長度。我的分類，然後拋出一個錯誤，由於錯誤的形狀

prediction = gbc.predict(x_predict) 

    File "C:\Python27\lib\site-packages\sklearn\ensemble\gradient_boosting.py", li 
ne 1032, in _init_decision_function 
    self.n_features, X.shape[1])) 
ValueError: X.shape[1] should be 1855, not 2090.

我怎樣才能解決這個問題呢？

來源

2015-01-21 beginner_

你可以這樣說：

測試數據

from sklearn.feature_selection import VarianceThreshold 

X = np.array([[0, 2, 0, 3], 
       [0, 1, 4, 3], 
       [0, 1, 1, 3]]) 
selector = VarianceThreshold()

替代1

>>> selector.fit(X) 
>>> idxs = selector.get_support(indices=True) 
>>> X[:, idxs] 
array([[2, 0], 
     [1, 4], 
     [1, 1]])

替代2

>>> selector.fit_transform(X) 
array([[2, 0], 
     [1, 4], 
     [1, 1]])

來源

2015-01-21 14:33:22 elyase

謝謝。備選方案1是我一直在尋找的。 – 2015-01-22 05:16:33

scikit-learn：獲取預測數據的選定特徵

回答

相關問題