2016-03-01 72 views
1

您好,我正在嘗試將文本分類爲4個類別,我想打印以及預測,文本屬於每個類別的概率。
閱讀文檔後Scikit學習,我想我應該用predict_proba, 到目前爲止我的代碼是這樣的:Scikit-learn獲得屬於某個類別的示例的可預測性

# -*- coding: utf-8 -*- 
#!/usr/bin/env python 
import sys 
import os 
import numpy as np 
from sklearn.feature_extraction.text import CountVectorizer 
from sklearn.naive_bayes import MultinomialNB 
from sklearn.pipeline import Pipeline 
from sklearn.metrics import confusion_matrix, f1_score 
from sklearn.datasets import load_files 
from sklearn.svm import SVC 
from sklearn.feature_extraction.text import TfidfTransformer 
from sklearn.naive_bayes import MultinomialNB 
from sklearn.metrics import accuracy_score 
from sklearn.metrics import classification_report 

string = sys.argv[1] #i will pass text to predict from console 
sets = load_files('scikit') #load training set 




count_vect = CountVectorizer(analyzer='char_wb', ngram_range=(0, 3), min_df=1) 
X_train_counts = count_vect.fit_transform(sets.data)  


tf_transformer = TfidfTransformer(use_idf=False).fit(X_train_counts) 
X_train_tf = tf_transformer.transform(X_train_counts) 


tfidf_transformer = TfidfTransformer() 
X_train_tfidf = tfidf_transformer.fit_transform(X_train_counts) 



clf = MultinomialNB().fit(X_train_tfidf, sets.target) 
docs_new = [string] 
X_new_counts = count_vect.transform(docs_new) 
X_new_tfidf = tfidf_transformer.transform(X_new_counts) 
predicted = clf.predict(X_new_tfidf) 
for doc, category in zip(docs_new, predicted): 
    print('%r => %s' % (doc, sets.target_names[category])) #print prediction , and it is correct 
    print(clf.predict_proba(sets.target_names)) #trying to get prob for al classes 

可悲的輸出是這樣的:ValueError: objects are not aligned,我已經嘗試了不同的方式來實現這一點很多並在網上搜索很多,但似乎沒有工作。 任何意見將不勝感激。謝謝 Nico。

+0

_錯誤發生在哪裏?在安裝'MNB'分類器或其他地方?如果是這樣,什麼樣的對象是'sets.target'? – tttthomasssss

+1

你會得到clf.predict_proba(X_new_tfidf) – Stergios

+0

@Stergios正確的概率,隨意張貼作爲答案, –

回答

0

predict_proba()函數的輸入應該與給予predict()方法的輸入完全相同。因此,您將得到概率

clf.predict_proba(X_new_tfidf)