0
對於文本分類項目,我爲特徵選擇和分類器製作了一個管道。現在我的問題是如果可以在管道中包含特徵提取模塊以及如何。我看了一些關於它的東西,但它似乎不符合我當前的代碼。在管道sklearn中包含特徵提取
這是我現在有:
# feature_extraction module.
from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.feature_extraction import DictVectorizer
import numpy as np
vec = DictVectorizer()
X = vec.fit_transform(instances)
scaler = StandardScaler(with_mean=False) # we use cross validation, no train/test set
X_scaled = scaler.fit_transform(X) # To make sure everything is on the same scale
enc = LabelEncoder()
y = enc.fit_transform(labels)
# Feature selection and classification pipeline
from sklearn.feature_selection import SelectKBest, mutual_info_classif
from sklearn import model_selection
from sklearn.metrics import classification_report
from sklearn.naive_bayes import MultinomialNB
from sklearn.svm import LinearSVC
from sklearn import linear_model
from sklearn.pipeline import Pipeline
feat_sel = SelectKBest(mutual_info_classif, k=200)
clf = linear_model.LogisticRegression()
pipe = Pipeline([('mutual_info', feat_sel), ('logistregress', clf)]))
y_pred = model_selection.cross_val_predict(pipe, X_scaled, y, cv=10)
我怎樣才能把dictvectorizer直到管道標籤編碼器?
是,實例是一個字典。那麼我不需要在特徵提取中再做'fit.transform'了? – Bambi
正確,你不必做任何'fit_transform'。管道將自動執行該操作。 –