1
我在管道流的predict
裏面有一個問題,每個管道步驟都有自定義類。在sklearn中使用自定義類的管道
class MyFeatureSelector():
def __init__(self, features=5, method='pca'):
self.features = features
self.method = method
def fit(self, X, Y):
return self
def transform(self, X, Y=None):
try:
if self.features < X.shape[1]:
if self.method == 'pca':
selector = PCA(n_components=self.features)
elif self.method == 'rfe':
selector = RFE(estimator=LinearRegression(n_jobs=-1),
n_features_to_select=self.features,
step=1)
selector.fit(X, Y)
return selector.transform(X)
except Exception as err:
print('MyFeatureSelector.transform(): {}'.format(err))
return X
def fit_transform(self, X, Y=None):
self.fit(X, Y)
return self.transform(X, Y)
model = Pipeline([
("DATA_CLEANER", MyDataCleaner(demo='', mode='strict')),
("DATA_ENCODING", MyEncoder(encoder_name='code')),
("FEATURE_SELECTION", MyFeatureSelector(features=15, method='rfe')),
("HUBER_MODELLING", HuberRegressor())
])
所以,上面的代碼工作得非常好位置:
model.fit(X, _Y)
不過我這裏有一個錯誤
prediction = model.predict(XT)
ERROR: shapes (672,107) and (15,) not aligned: 107 (dim 1) != 15 (dim 0)
調試表明,問題在這裏:selector.fit(X, Y)
因爲新實例MyFeatureSelector
在predict()
步驟中創建,而Y
未創建在那個時刻存在。
我哪裏錯了?
是。因爲管道在訓練期間將調用fit(),並且在預測期間只會調用()。並且你正在調用selector.fit()在transform()中,這將改變數據 –
首先檢查X,XT和Y_的形狀。顯示完整的堆棧跟蹤錯誤。然後我可能會建議您的自定義班級中的更改。 –