使用管道sklearn（Python）的多個自定義類

我嘗試在學生的Pipeline上做一個教程，但我阻止。我不是專家，但我正在努力改進。所以謝謝你的放縱。事實上，我嘗試在管道中準備一個數據幀進行分類執行以下幾個步驟：使用管道sklearn（Python）的多個自定義類

第1步：數據幀的說明
第2步：填寫NaN值
步驟3：轉變分類值轉換爲數字

這裏是我的代碼：

class Descr_df(object): 

    def transform (self, X): 
     print ("Structure of the data: \n {}".format(X.head(5))) 
     print ("Features names: \n {}".format(X.columns)) 
     print ("Target: \n {}".format(X.columns[0])) 
     print ("Shape of the data: \n {}".format(X.shape)) 

    def fit(self, X, y=None): 
     return self 

class Fillna(object): 

    def transform(self, X): 
     non_numerics_columns = X.columns.difference(X._get_numeric_data().columns) 
     for column in X.columns: 
      if column in non_numerics_columns: 
       X[column] = X[column].fillna(df[column].value_counts().idxmax()) 
      else: 
       X[column] = X[column].fillna(X[column].mean())    
     return X 

    def fit(self, X,y=None): 
     return self 

class Categorical_to_numerical(object): 

    def transform(self, X): 
     non_numerics_columns = X.columns.difference(X._get_numeric_data().columns) 
     le = LabelEncoder() 
     for column in non_numerics_columns: 
      X[column] = X[column].fillna(X[column].value_counts().idxmax()) 
      le.fit(X[column]) 
      X[column] = le.transform(X[column]).astype(int) 
     return X 

    def fit(self, X, y=None): 
     return self

如果我執行步驟1和2或者步驟1和3，但是如果我同時執行步驟1，步驟2和步驟3。我有這樣的錯誤：

pipeline = Pipeline([('df_intropesction', Descr_df()), ('fillna',Fillna()), ('Categorical_to_numerical', Categorical_to_numerical())]) 
pipeline.fit(X, y) 
AttributeError: 'NoneType' object has no attribute 'columns'

來源

2017-04-19 Jeremie Guez

也許他們中的一些無：'X'或'y'。全堆棧請。 – sergzach

這個錯誤的出現是因爲在管道第一估計的輸出成爲第二，那麼第二估計的輸出爲第三等...

從documentation of Pipeline：

Fit all the transforms one after the other and transform the data, then fit the transformed data using the final estimator.

因此，對於你的管道，執行的步驟如下：

Descr_df.fit（X） - >沒有做任何事情，並返回自
下一頁末= Descr_df.transform（X） - >應該返回一些值分配到下一頁末應該到下一個估計被傳遞，但你的定義呢不返回任何東西（只有打印）。因此，無返回
Fillna.fit（newX） - >不做任何事情並返回自我
Fillna.transform（newX） - >調用newX.columns。但是，步驟2中的newX =無。因此錯誤。

解決方案：更改Descr_df的變換方法返回的數據幀，因爲它是：

def transform (self, X): 
    print ("Structure of the data: \n {}".format(X.head(5))) 
    print ("Features names: \n {}".format(X.columns)) 
    print ("Target: \n {}".format(X.columns[0])) 
    print ("Shape of the data: \n {}".format(X.shape)) 
    return X

建議：使類從基地估計繼承和變壓器班scikit確認到良好的做法。

即改變class Descr_df(object)到class Descr_df(BaseEstimator, TransformerMixin)，Fillna(object)到Fillna(BaseEstimator, TransformerMixin)等。

在管道參見本例中爲定製類的更多細節：

http://scikit-learn.org/stable/auto_examples/hetero_feature_union.html#sphx-glr-auto-examples-hetero-feature-union-py

來源

2017-04-19 16:19:38

我會看看並讓你知道。你的答案似乎非常有趣和有幫助。謝謝！ –

@JeremieGuez試試解決方案，如果它適合您，請考慮接受此答案 –

似乎好！謝謝 –

使用管道sklearn（Python）的多個自定義類

回答

相關問題