如果我有中文單詞列表：like reference = ['我'，'是'，'好'，'人'] ，假設= ['我'，'是'，'善良的'，'人]。我可以在中文翻譯中使用：nltk.translate.bleu_score.sentence_bleu（參考文獻，假設）嗎？它和英語一樣嗎？日本人怎麼樣？我的意思是如果我有英文單詞列表（中文和日文）。謝謝！BLEU分數：我可以用nltk.translate.bleu_score.sentence_bleu來計算中文分數的分數嗎

2017-09-27 tktktk0711

你爲什麼不自己嘗試一下？ =（ – alvas

TL; DR

是的。

在龍

BLEU得分的措施正克和不可知的語言，但它依賴於事實的語言中的句子可以分成令牌。所以是的，它可以比較中國/日本...

請注意在句級使用BLEU分數的注意事項。 BLEU從來沒有創建過考慮句子級別的比較，這裏有一個很好的討論：https://github.com/nltk/nltk/issues/1838

最有可能的是，當你真的有短句子時，你會看到警告。

>>> from nltk.translate import bleu 
>>> ref = '我 是 好 人'.split() 
>>> hyp = '我 是 善良的 人'.split() 
>>> bleu([ref], hyp) 
/usr/local/lib/python2.7/site-packages/nltk/translate/bleu_score.py:490: UserWarning: 
Corpus/Sentence contains 0 counts of 3-gram overlaps. 
BLEU scores might be undesirable; use SmoothingFunction(). 
    warnings.warn(_msg) 
0.7071067811865475

您可以使用https://github.com/alvations/nltk/blob/develop/nltk/translate/bleu_score.py#L425中的平滑函數來克服短句子。

>>> from nltk.translate.bleu_score import SmoothingFunction 
>>> smoothie = SmoothingFunction().method4 
>>> bleu([ref], hyp, smoothing_function=smoothie) 
0.2866227639866161

來源

2017-09-27 10:39:45 alvas

謝謝@ alvas你真好！根據你的回答，corpus_bleu是一樣的。 – tktktk0711

BLEU分數：我可以用nltk.translate.bleu_score.sentence_bleu來計算中文分數的分數嗎

回答

TL; DR

在龍

相關問題