我對樣本數據分類正面和負面情緒。我使用了下面的代碼片斷。Scikit-learn - 在測量精確度時獲取NAN值。
一切看起來都OK,直到第20行打印預期的預測。
但是,當我嘗試使用度量標準來衡量準確性時,它給了我「NAN」值。你可以請檢閱我的代碼,並幫我找出問題。
from sklearn.naive_bayes import MultinomialNB
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn import metrics
import csv
# Read in the training data.
with open("/Users/max/train.csv", 'r') as file:
reviews = list(csv.reader(file))
with open("/Users/max/test.csv",'r') as file:
test_reviews = list(csv.reader(file))
vectorizer = TfidfVectorizer(min_df=1)
train_features = vectorizer.fit_transform([review[0] for review in reviews])
test_features = vectorizer.transform([test_review[0] for test_review in test_reviews])
nb = MultinomialNB()
nb.fit(train_features, [int(review[1]) for review in reviews])
predictions = nb.predict(test_features)
print("prediction : {0}".format(predictions))
actual = [int(r[1]) for r in test_reviews]
fpr, tpr, threshold = metrics.roc_curve(actual, predictions, pos_label=1)
print("Multinomial naive bayes AUC: {0}".format(metrics.auc(fpr, tpr)))
集樣本以這種格式
i like google , 1
i dont really like microsoft , -1
這裏是控制檯輸出你沒有在你的數據的真正積極的實例
prediction : [1 -1]
/Library/Python/2.7/site-packages/sklearn/metrics/ranking.py:496: UndefinedMetricWarning: No positive samples in y_true, true positive value should be meaningless UndefinedMetricWarning)
Multinomial naive bayes AUC: nan
您是否嘗試過使用'roc_auc_score'來代替? http://scikit-learn.org/stable/modules/generated/sklearn.metrics.roc_auc_score.html#sklearn.metrics.roc_auc_score – Dair
@Dair,似乎它的工作原理。他們之間有什麼不同? – Max
我不知道,但文件指出它作爲替代。 – Dair