1

您好我正在嘗試使用python 2.x中的Naive Bayes分類器進行情感分析。它使用txt文件讀取情緒,然後根據示例txt文件情緒給出正面或負面的輸出。 我希望輸出與輸入相同,例如我有一個文本文件讓我們可以看到1000條原始情緒,我希望輸出對每個情緒都顯示正面或負面。 請幫忙。 下面是我使用文本分析 - 無法在csv或xls文件中編寫Python程序的輸出

import math 
import string 

def Naive_Bayes_Classifier(positive, negative, total_negative, total_positive, test_string): 
    y_values = [0,1] 
    prob_values = [None, None] 

    for y_value in y_values: 
     posterior_prob = 1.0 

     for word in test_string.split(): 
      word = word.lower().translate(None,string.punctuation).strip() 
      if y_value == 0: 
       if word not in negative: 
        posterior_prob *= 0.0 
       else: 
        posterior_prob *= negative[word] 
      else: 
       if word not in positive: 
        posterior_prob *= 0.0 
       else: 
        posterior_prob *= positive[word] 

     if y_value == 0: 
      prob_values[y_value] = posterior_prob * float(total_negative)/(total_negative + total_positive) 
     else: 
      prob_values[y_value] = posterior_prob * float(total_positive)/(total_negative + total_positive) 

    total_prob_values = 0 
    for i in prob_values: 
     total_prob_values += i 

    for i in range(0,len(prob_values)): 
     prob_values[i] = float(prob_values[i])/total_prob_values 

    print prob_values 

    if prob_values[0] > prob_values[1]: 
     return 0 
    else: 
     return 1 


if __name__ == '__main__': 
    sentiment = open(r'C:/Users/documents/sample.txt') 

    #Preprocessing of training set 
    vocabulary = {} 
    positive = {} 
    negative = {} 
    training_set = [] 
    TOTAL_WORDS = 0 
    total_negative = 0 
    total_positive = 0 

    for line in sentiment: 
     words = line.split() 
     y = words[-1].strip() 
     y = int(y) 

     if y == 0: 
      total_negative += 1 
     else: 
      total_positive += 1 

     for word in words: 
      word = word.lower().translate(None,string.punctuation).strip() 
      if word not in vocabulary and word.isdigit() is False: 
       vocabulary[word] = 1 
       TOTAL_WORDS += 1 
      elif word in vocabulary: 
       vocabulary[word] += 1 
       TOTAL_WORDS += 1 

      #Training 
      if y == 0: 
       if word not in negative: 
        negative[word] = 1 
       else: 
        negative[word] += 1 
      else: 
       if word not in positive: 
        positive[word] = 1 
       else: 
        positive[word] += 1 

    for word in vocabulary.keys(): 
     vocabulary[word] = float(vocabulary[word])/TOTAL_WORDS 

    for word in positive.keys(): 
     positive[word] = float(positive[word])/total_positive 

    for word in negative.keys(): 
     negative[word] = float(negative[word])/total_negative 

    test_string = raw_input("Enter the review: \n") 

    classifier = Naive_Bayes_Classifier(positive, negative, total_negative, total_positive, test_string) 
    if classifier == 0: 
     print "Negative review" 
    else: 
     print "Positive review" 
+0

嗨亞太區首席技術官Matt,根據我所瞭解,你想作爲輸出用句詞的CSV/xls文件,用戶插入的輸入。對於每個單詞,您都希望分類器計算的相對情緒(正面或負面)。這是對的嗎?你能提供一個想要的csv/xls文件的例子嗎?謝謝 – Giordano

+0

我會粘貼下面的csv文件的內容: – hitesh

+0

一個好產品 - 你的工作很有趣! 多年來一直享有良好的使用體驗。 好的產品 好結果 我不使用任何更多 我一直是一個穩定的產品 總體一個非常好的產品相比其餘 產品正常工作,但別人告訴我一些其他的產品優越。 穩健 慢 最好所有 無法安裝 用戶友好 非常糟糕 很難理解日誌和繁瑣的設置和部署,正確的。下面是 – hitesh

回答

1

我已經檢查由您在評論張貼GitHub庫中的代碼。我試圖運行該項目,但我有一些錯誤。

無論如何,我已經檢查了項目結構和用於訓練樸素貝葉斯算法的文件,我認爲可以使用以下代碼片段將結果數據寫入Excel文件(即.xls)

with open("test11.txt") as f: 
    for line in f: 
     classifier = naive_bayes_classifier(positive, negative, total_negative, total_positive, line) 
     result = 'Positive' if classifier == 0 else 'Negative' 
     data_to_be_written += ([line, result],) 

# Create a workbook and add a worksheet. 
workbook = xlsxwriter.Workbook('test.xls') 
worksheet = workbook.add_worksheet() 

# Start from the first cell. Rows and columns are zero indexed. 
row = 0 
col = 0 

# Iterate over the data and write it out row by row. 
for item, cost in data_to_be_written: 
    worksheet.write(row, col,  item) 
worksheet.write(row, col + 1, cost) 
row += 1 

workbook.close() 

Sorthly,與句子中的文件的每一行進行測試,我所說的分類,並準備將在csv文件寫入的結構。
然後循環結構並寫入xls文件。
爲此,我使用了一個名爲xlsxwriter的python網站包。

正如我之前告訴過你的,我運行該項目時遇到了一些問題,所以這段代碼也沒有經過測試。無論如何,如果您遇到麻煩,請通知我。

問候

+0

@ Giordano-謝謝。我嘗試運行,但有一些錯誤。 – hitesh

+0

將代碼更改爲below- – hitesh

+0

哪種錯誤?你可以發佈他們嗎? – Giordano

0
> with open("test11.txt") as f: 
>  for line in f: 
>   classifier = Naive_Bayes_Classifier(positive, negative, total_negative, total_positive, line) if classifier == 0: 
>  f.write(line + 'Negative') else: 
>  f.write(line + 'Positive') 
>  
> #  result = 'Positive' if classifier == 0 else 'Negative' 
> #  data_to_be_written += ([line, result],) 
> 
> # Create a workbook and add a worksheet. workbook = xlsxwriter.Workbook('test.xls') worksheet = workbook.add_worksheet() 
> 
> # Start from the first cell. Rows and columns are zero indexed. row = 0 col = 0 
> 
> # Iterate over the data and write it out row by row. for item, cost in f: worksheet.write(row, col,  item) worksheet.write(row, col + 
> 1, cost) row += 1 
> 
> workbook.close() 
+0

仍然得到一個零誤差:( – hitesh