從文件中提取單詞

我使用python打開文件以查找打開的文件中是否存在預定義的單詞集。我將一組預定義的單詞放在一個列表中，並打開了需要測試的文件。現在有什麼方法可以用python而不是行來提取單詞。這使我的工作變得更容易。從文件中提取單詞

2011-02-10 nikhil

import re 

def get_words_from_string(s): 
    return set(re.findall(re.compile('\w+'), s.lower())) 

def get_words_from_file(fname): 
    with open(fname, 'rb') as inf: 
     return get_words_from_string(inf.read()) 

def all_words(needle, haystack): 
    return set(needle).issubset(set(haystack)) 

def any_words(needle, haystack): 
    return set(needle).intersection(set(haystack)) 

search_words = get_words_from_string("This is my test") 
find_in = get_words_from_string("If this were my test, I is passing") 

print any_words(search_words, find_in) 

print all_words(search_words, find_in)

回報

set(['this', 'test', 'is', 'my']) 
True

來源

2011-02-10 22:53:24

一個完美的解決方案...如果該文件是太聰明large..any解決 – nikhil 2011-02-10 23:09:38

你可以做幾件事情

呼叫file.readlines（）和分裂整個文本您想要的分隔符，如果你的文字並不大
調用read（），並做到這一點，在字節一時間

退房的pydocs文件 - http://docs.python.org/release/2.5.2/lib/bltin-file-objects.html

來源

2011-02-10 22:37:04 dfb

此代碼將顯示哪些話是存在於文件中，因爲這個詞精確匹配，和我不在標點符號或其他字符之前或之後，並且是相同的情況。通過一些小的調整，代碼可以變得更寬容。

words = set(['hello', 'world', 'testing']) 
f  = open('testfile.txt', 'rb') 
data = set(f.read().split()) 
print words.intersection(data)

來源

2011-02-10 22:53:04 zchtodd

從文件中提取單詞

回答

相關問題