2016-10-22 66 views
3

我想用NLTK檢查python中某個句子的拼寫。內置的spell checker無法正常工作。它給with和'和'作爲錯誤的拼寫。NLTK的拼寫檢查器無法正常工作

def tokens(sent): 
     return nltk.word_tokenize(sent) 

def SpellChecker(line): 
     for i in tokens(line): 
      strip = i.rstrip() 
      if not WN.synsets(strip): 
       print("Wrong spellings : " +i) 
      else: 
       print("No mistakes :" + i) 

def removePunct(str): 
     return "".join(c for c in str if c not in ('!','.',':',',')) 

l = "Attempting artiness With black & white and clever camera angles, the movie disappointed - became even more ridiculous - as the acting was poor and the plot and lines almost non-existent. " 
noPunct = removePunct(l.lower()) 
if(SpellChecker(noPunct)): 
     print(l) 
     print(noPunct) 

有人能給我理由嗎?

回答

3

這是給錯誤的拼寫,因爲這些是未包含在共發現stopwords(檢查FAQs

所以,你可以改爲禁用詞使用從NLTK語料庫檢查這樣的字眼。

#Add these lines: 
import nltk 
from nltk.corpus import wordnet as WN 
from nltk.corpus import stopwords 
stop_words_en = set(stopwords.words('english')) 

def tokens(sent): 
     return nltk.word_tokenize(sent) 

def SpellChecker(line): 
    for i in tokens(line): 
     strip = i.rstrip() 
     if not WN.synsets(strip): 
      if strip in stop_words_en: # <--- Check whether it's in stopword list 
       print("No mistakes :" + i) 
      else: 
       print("Wrong spellings : " +i) 
     else: 
      print("No mistakes :" + i) 


def removePunct(str): 
     return "".join(c for c in str if c not in ('!','.',':',',')) 

l = "Attempting artiness With black & white and clever camera angles, the movie disappointed - became even more ridiculous - as the acting was poor and the plot and lines almost non-existent. " 

noPunct = removePunct(l.lower()) 
if(SpellChecker(noPunct)): 
     print(l) 
     print(noPunct)