scikit將輸出metrics metrics.classification_report轉換爲CSV /製表符分隔格式

我正在Scikit-Learn中進行多類文本分類。該數據集正在使用具有數百個標籤的Multinomial樸素貝葉斯分類器進行訓練。下面是來自Scikit提取了解腳本用於安裝MNB模型scikit將輸出metrics metrics.classification_report轉換爲CSV /製表符分隔格式

from __future__ import print_function 

# Read **`file.csv`** into a pandas DataFrame 

import pandas as pd 
path = 'data/file.csv' 
merged = pd.read_csv(path, error_bad_lines=False, low_memory=False) 

# define X and y using the original DataFrame 
X = merged.text 
y = merged.grid 

# split X and y into training and testing sets; 
from sklearn.cross_validation import train_test_split 
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=1) 

# import and instantiate CountVectorizer 
from sklearn.feature_extraction.text import CountVectorizer 
vect = CountVectorizer() 

# create document-term matrices using CountVectorizer 
X_train_dtm = vect.fit_transform(X_train) 
X_test_dtm = vect.transform(X_test) 

# import and instantiate MultinomialNB 
from sklearn.naive_bayes import MultinomialNB 
nb = MultinomialNB() 

# fit a Multinomial Naive Bayes model 
nb.fit(X_train_dtm, y_train) 

# make class predictions 
y_pred_class = nb.predict(X_test_dtm) 

# generate classification report 
from sklearn import metrics 
print(metrics.classification_report(y_test, y_pred_class))

和命令行屏幕上的metrics.classification_report的簡化輸出如下：

   precision recall f1-score support 
    12  0.84  0.48  0.61  2843 
    13  0.00  0.00  0.00  69 
    15  1.00  0.19  0.32  232 
    16  0.75  0.02  0.05  965 
    33  1.00  0.04  0.07  155 
     4  0.59  0.34  0.43  5600 
    41  0.63  0.49  0.55  6218 
    42  0.00  0.00  0.00  102 
    49  0.00  0.00  0.00  11 
     5  0.90  0.06  0.12  2010 
    50  0.00  0.00  0.00   5 
    51  0.96  0.07  0.13  1267 
    58  1.00  0.01  0.02  180 
    59  0.37  0.80  0.51  8127 
     7  0.91  0.05  0.10  579 
     8  0.50  0.56  0.53  7555  
    avg/total 0.59  0.48  0.45  35919

我在想，如果有任何將報表輸出轉換爲標準csv文件並使用常規列標題的方法

當我將命令行輸出發送到csv文件或試圖將屏幕輸出複製/粘貼到電子表格中時 - OpenOffice Calc或Excel，將結果歸因於一個列。這樣看：

幫助表示讚賞。謝謝！

來源

2016-09-23 Seun AJAO

我會試圖重新的結果，因爲我鍵入此，但具有u嘗試使用熊貓旋轉工作臺成數據幀，然後發送數據幀到csv使用'dataframe_name_here.to_csv（）'？您是否也可以顯示將結果寫入csv的代碼？ – MattR

@MattR我編輯了這個問題，並提供了完整的Python代碼...我將腳本的輸出從Linux命令行傳遞給一個CSV文件：$ python3 script.py> result.csv –

-1

我總是解決輸出問題的方式就像我在之前的評論中提到的，我已將輸出轉換爲DataFrame。不僅難以置信地發送到文件（see here），而且Pandas真的很容易操作數據結構。我解決這個問題的另一種方法是使用CSV逐行編寫輸出，特別是使用writerow。

如果你能得到的輸出入數據幀這將是

dataframe_name_here.to_csv()

，或者使用CSV它會像他們在CSV鏈接提供的例子。

來源

2016-09-23 15:53:00 MattR

謝謝我試圖使用數據框; 'Result = metrics.classification_report（y_test，y_pred_class）; df = pd.DataFrame（結果）; df.to_csv（results.csv，sep ='\ t'）'但出現錯誤_pandas.core.common.PandasError：未正確調用DataFrame構造函數！_ –

這並不真正回答這個問題。 classification_report的輸出不能直接轉換爲DataFrame。 – CentAu

如果你想要個人成績，這應該做的工作就好了。

import pandas as pd 

def classifaction_report_csv(report): 
    report_data = [] 
    lines = report.split('\n') 
    for line in lines[2:-3]: 
     row = {} 
     row_data = line.split('  ') 
     row['class'] = row_data[0] 
     row['precision'] = float(row_data[1]) 
     row['recall'] = float(row_data[2]) 
     row['f1_score'] = float(row_data[3]) 
     row['support'] = float(row_data[4]) 
     report_data.append(row) 
    dataframe = pd.DataFrame.from_dict(report_data) 
    dataframe.to_csv('classification_report.csv', index = False) 

report = classification_report(y_true, y_pred) 
classifaction_report_csv(report)

來源

2016-12-08 16:33:13 kindjacket

row ['precision'] = float（row_data [1]） ValueError：無法將字符串轉換爲float： – user3806649

我們可以從precision_recall_fscore_support函數獲取實際值，然後將它們放入數據框中。下面的代碼會給出相同的結果，但現在在熊貓df :)。

clf_rep = metrics.precision_recall_fscore_support(true, pred) 
out_dict = { 
      "precision" :clf_rep[0].round(2) 
      ,"recall" : clf_rep[1].round(2) 
      ,"f1-score" : clf_rep[2].round(2) 
      ,"support" : clf_rep[3] 
      } 
out_df = pd.DataFrame(out_dict, index = nb.classes_) 
avg_tot = (out_df.apply(lambda x: round(x.mean(), 2) if x.name!="support" else round(x.sum(), 2)).to_frame().T) 
avg_tot.index = ["avg/total"] 
out_df = out_df.append(avg_tot) 
print out_df

來源

2017-02-26 10:02:12 jaknap32

def to_table(report): 
    report = report.splitlines() 
    res = [] 
    res.append(['']+report[0].split()) 
    for row in report[2:-2]: 
     res.append(row.split()) 
    lr = report[-1].split() 
    res.append([' '.join(lr[:3])]+lr[3:]) 
    return np.array(res)

返回numpy的陣列，其可以被轉動以大熊貓數據幀或僅被保存爲csv文件。

來源

2017-07-24 14:51:53 Sipan17

雖然以前的答案可能都在工作，但我發現它們有點冗長。以下內容將單獨的課程結果以及摘要行存儲在一個數據框中。對報告中的變化不太敏感，但爲我做了訣竅。

#init snippet and fake data 
from io import StringIO 
import re 
import pandas as pd 
from sklearn import metrics 
true_label = [1,1,2,2,3,3] 
pred_label = [1,2,2,3,3,1] 

def report_to_df(report): 
    report = re.sub(r" +", " ", report).replace("avg/total", "avg/total").replace("\n ", "\n") 
    report_df = pd.read_csv(StringIO("Classes" + report), sep=' ', index_col=0)   
    return(report_df) 

#txt report to df 
report = metrics.classification_report(true_label, pred_label) 
report_df = report_to_df(report) 

#store, print, copy... 
print (report_df)

這給所需的輸出：

Classes precision recall f1-score support 
1 0.5 0.5 0.5 2 
2 0.5 0.5 0.5 2 
3 0.5 0.5 0.5 2 
avg/total 0.5 0.5 0.5 6

來源

2017-09-27 12:27:48

scikit將輸出metrics metrics.classification_report轉換爲CSV /製表符分隔格式

回答

相關問題