2017-04-05 70 views
1

我在管道流的predict裏面有一個問題,每個管道步驟都有自定義類。在sklearn中使用自定義類的管道

class MyFeatureSelector(): 
    def __init__(self, features=5, method='pca'): 
     self.features = features 
     self.method = method 

    def fit(self, X, Y): 
     return self 

    def transform(self, X, Y=None): 
     try: 
      if self.features < X.shape[1]: 
       if self.method == 'pca': 
        selector = PCA(n_components=self.features) 
       elif self.method == 'rfe': 
        selector = RFE(estimator=LinearRegression(n_jobs=-1), 
            n_features_to_select=self.features, 
            step=1) 
       selector.fit(X, Y) 
       return selector.transform(X) 
     except Exception as err: 
      print('MyFeatureSelector.transform(): {}'.format(err)) 
     return X 

    def fit_transform(self, X, Y=None): 
     self.fit(X, Y) 
     return self.transform(X, Y) 


model = Pipeline([ 
    ("DATA_CLEANER", MyDataCleaner(demo='', mode='strict')), 
    ("DATA_ENCODING", MyEncoder(encoder_name='code')), 
    ("FEATURE_SELECTION", MyFeatureSelector(features=15, method='rfe')), 
    ("HUBER_MODELLING", HuberRegressor()) 
]) 

所以,上面的代碼工作得非常好位置:

model.fit(X, _Y) 

不過我這裏有一個錯誤

prediction = model.predict(XT) 

ERROR: shapes (672,107) and (15,) not aligned: 107 (dim 1) != 15 (dim 0)

調試表明,問題在這裏:selector.fit(X, Y)因爲新實例MyFeatureSelectorpredict()步驟中創建,而Y未創建在那個時刻存在。

我哪裏錯了?

+0

是。因爲管道在訓練期間將調用fit(),並且在預測期間只會調用()。並且你正在調用selector.fit()在transform()中,這將改變數據 –

+0

首先檢查X,XT和Y_的形狀。顯示完整的堆棧跟蹤錯誤。然後我可能會建議您的自定義班級中的更改。 –

回答

1

工作版本貼在下面:

class MyFeatureSelector(): 
    def __init__(self, features=5, method='pca'): 
     self.features = features 
     self.method = method 
     self.selector = None 
     self.init_selector() 


    def init_selector(): 
     if self.method == 'pca': 
      self.selector = PCA(n_components=self.features) 
     elif self.method == 'rfe': 
     self.selector = RFE(estimator=LinearRegression(n_jobs=-1), 
           n_features_to_select=self.features, 
           step=1) 

    def fit(self, X, Y): 
     return self 

    def transform(self, X, Y=None): 
     try: 
      if self.features < X.shape[1]: 
       if Y is not None: 
        self.selector.fit(X, Y) 
       return selector.transform(X) 
     except Exception as err: 
      print('MyFeatureSelector.transform(): {}'.format(err)) 
     return X 

def fit_transform(self, X, Y=None): 
    self.fit(X, Y) 
    return self.transform(X, Y)