用於sklearn管道中分類的圖像數組 - ValueError：用序列設置數組元素

我有一個圖像，我想分類爲A或B.爲此，我加載並調整它們的大小以160x160大小，然後轉換二維陣至1D，將它們添加到一個大熊貓數據幀：用於sklearn管道中分類的圖像數組 - ValueError：用序列設置數組元素

我想有不僅僅是用於分類後的圖像更（作爲一個例子，產品描述），所以我使用與管道FeatureUnion（即使它現在只有圖像）。 ItemSelector就是從這裏取：

http://scikit-learn.org/stable/auto_examples/hetero_feature_union.html

它需要在「圖像」列中的值。或者，可以做train_X = df.iloc[train_indices]["image"].values，但我想稍後添加其他列。

def randomforest_image_pipeline(): 
    """Returns a RandomForest pipeline.""" 
    return Pipeline([ 
     ("union", FeatureUnion(
      transformer_list=[ 
       ("image", Pipeline([ 
        ("selector", ItemSelector(key="image")), 
       ])) 
      ], 
      transformer_weights={ 
       "image": 1.0 
      }, 
     )), 
     ("classifier", RandomForestClassifier()), 
    ])

然後用KFold分類：

from sklearn.model_selection import KFold 
kfold(tested_pipeline=randomforest_image_pipeline(), df=df) 
def kfold(tested_pipeline=None, df=None, splits=6): 
    k_fold = KFold(n_splits=splits) 
    for train_indices, test_indices in k_fold.split(df): 
     # training set 
     train_X = df.iloc[train_indices] 
     train_y = df.iloc[train_indices]['class'].values 
     # test set 
     test_X = df.iloc[test_indices] 
     test_y = df.iloc[test_indices]['class'].values 
     for val in train_X["image"]: 
      print(len(val), val.dtype, val.shape) 
      # 76800 uint8 (76800,) for all 
     tested_pipeline.fit(train_X, train_y) # crashes in this call 
     pipeline_predictions = tested_pipeline.predict(test_X) 
     ...

然而，對於.fit我收到以下錯誤：

Traceback (most recent call last): 
    File "<path>/project/classifier/classify.py", line 362, in <module> 
    best = best_pipeline(dataframe=data, f1_scores=f1_dict, get_fp=True) 
    File "<path>/project/classifier/classify.py", line 351, in best_pipeline 
    confusion_list=confusion_list, get_fp=get_fp) 
    File "<path>/project/classifier/classify.py", line 65, in kfold 
    tested_pipeline.fit(train_X, train_y) 
    File "/usr/local/lib/python3.5/dist-packages/sklearn/pipeline.py", line 270, in fit 
    self._final_estimator.fit(Xt, y, **fit_params) 
    File "/usr/local/lib/python3.5/dist-packages/sklearn/ensemble/forest.py", line 247, in fit 
    X = check_array(X, accept_sparse="csc", dtype=DTYPE) 
    File "/usr/local/lib/python3.5/dist-packages/sklearn/utils/validation.py", line 382, in check_array 
    array = np.array(array, dtype=dtype, order=order, copy=copy) 
ValueError: setting an array element with a sequence.

我發現其他人有同樣的問題，他們的問題是他們的行不一樣長。這似乎並沒有對我的情況下，所有的行都是一維長度爲76800：

for val in train_X["image"]: 
     print(len(val), val.dtype, val.shape) 
     # 76800 uint8 (76800,) for all

在崩潰行array看起來像這樣（從調試器複製）：

[array([ 255., 255., 255., ..., 255., 255., 255.]) 
array([ 255., 255., 255., ..., 255., 255., 255.]) 
array([ 255., 255., 255., ..., 255., 255., 255.]) ..., 
array([ 255., 255., 255., ..., 255., 255., 255.]) 
array([ 255., 255., 255.

我該怎麼做才能解決這個問題？

來源

2017-08-24 Lomtrur

錯誤是因爲您將圖像的所有數據（即76800功能）保存在列表中，並且該列表保存在dataFrame的單個列中。

因此，當您使用ItemSelector來選擇該列時，其輸出將是形狀爲(Train_len,)的單維數組。 FeatureUnion或後續估算器不可見76800的內部維度。

更改ItemSelector的transform()函數以返回具有形狀（Train_len，76800）的適當2維數據數組。只有這樣它才能工作。

更改爲：

def transform(self, data_dict): 
    return np.array([np.array(x) for x in data_dict[self.key]])

隨意問如果不懂。

來源

2017-08-24 10:32:35

不可思議，非常感謝你！有用！ – Lomtrur

@Lomtrur太棒了！現在確保您在FeatureUnion中添加的其他變形器也返回一個二維數組。只有這樣他們才能正確結合。 –

用於sklearn管道中分類的圖像數組 - ValueError：用序列設置數組元素

回答

相關問題