我有一個圖像,我想分類爲A或B.爲此,我加載並調整它們的大小以160x160大小,然後轉換二維陣至1D,將它們添加到一個大熊貓數據幀:用於sklearn管道中分類的圖像數組 - ValueError:用序列設置數組元素
我想有不僅僅是用於分類後的圖像更(作爲一個例子,產品描述),所以我使用與管道FeatureUnion(即使它現在只有圖像)。 ItemSelector就是從這裏取:
http://scikit-learn.org/stable/auto_examples/hetero_feature_union.html
它需要在「圖像」列中的值。或者,可以做train_X = df.iloc[train_indices]["image"].values
,但我想稍後添加其他列。
def randomforest_image_pipeline():
"""Returns a RandomForest pipeline."""
return Pipeline([
("union", FeatureUnion(
transformer_list=[
("image", Pipeline([
("selector", ItemSelector(key="image")),
]))
],
transformer_weights={
"image": 1.0
},
)),
("classifier", RandomForestClassifier()),
])
然後用KFold分類:
from sklearn.model_selection import KFold
kfold(tested_pipeline=randomforest_image_pipeline(), df=df)
def kfold(tested_pipeline=None, df=None, splits=6):
k_fold = KFold(n_splits=splits)
for train_indices, test_indices in k_fold.split(df):
# training set
train_X = df.iloc[train_indices]
train_y = df.iloc[train_indices]['class'].values
# test set
test_X = df.iloc[test_indices]
test_y = df.iloc[test_indices]['class'].values
for val in train_X["image"]:
print(len(val), val.dtype, val.shape)
# 76800 uint8 (76800,) for all
tested_pipeline.fit(train_X, train_y) # crashes in this call
pipeline_predictions = tested_pipeline.predict(test_X)
...
然而,對於.fit
我收到以下錯誤:
Traceback (most recent call last):
File "<path>/project/classifier/classify.py", line 362, in <module>
best = best_pipeline(dataframe=data, f1_scores=f1_dict, get_fp=True)
File "<path>/project/classifier/classify.py", line 351, in best_pipeline
confusion_list=confusion_list, get_fp=get_fp)
File "<path>/project/classifier/classify.py", line 65, in kfold
tested_pipeline.fit(train_X, train_y)
File "/usr/local/lib/python3.5/dist-packages/sklearn/pipeline.py", line 270, in fit
self._final_estimator.fit(Xt, y, **fit_params)
File "/usr/local/lib/python3.5/dist-packages/sklearn/ensemble/forest.py", line 247, in fit
X = check_array(X, accept_sparse="csc", dtype=DTYPE)
File "/usr/local/lib/python3.5/dist-packages/sklearn/utils/validation.py", line 382, in check_array
array = np.array(array, dtype=dtype, order=order, copy=copy)
ValueError: setting an array element with a sequence.
我發現其他人有同樣的問題,他們的問題是他們的行不一樣長。這似乎並沒有對我的情況下,所有的行都是一維長度爲76800:
for val in train_X["image"]:
print(len(val), val.dtype, val.shape)
# 76800 uint8 (76800,) for all
在崩潰行array
看起來像這樣(從調試器複製):
[array([ 255., 255., 255., ..., 255., 255., 255.])
array([ 255., 255., 255., ..., 255., 255., 255.])
array([ 255., 255., 255., ..., 255., 255., 255.]) ...,
array([ 255., 255., 255., ..., 255., 255., 255.])
array([ 255., 255., 255.
我該怎麼做才能解決這個問題?
不可思議,非常感謝你!有用! – Lomtrur
@Lomtrur太棒了!現在確保您在FeatureUnion中添加的其他變形器也返回一個二維數組。只有這樣他們才能正確結合。 –