無法遵循NLTK語料庫結構

我開始於NLTK & Python但我真的很困惑與NLTK語料庫結構。例如無法遵循NLTK語料庫結構

林無法跟隨我們爲什麼需要兩次追加的話到nltk.corpus模塊，

單詞表= [W爲w的nltk.corpus.words.words（恩'）if w.islower（）]
此外，nltk.corpus.words和nltk.corpus.words.words的類型保持不同。爲什麼這樣？

類型（nltk.corpus） nltk.corpus 類型（nltk.corpus.words） nltk.corpus.words 類型（nltk.corpus.words.words） nltk.corpus。 words.words C：\\ Documents和Settings \\ \\管理員\\ nltk_data語料庫\\的話> >
第三，如何一個應該知道，一個人需要追加的話兩次nltk.corpus爲了生成e詞彙表。我的意思是撥打nltk.corpus.words和nltk.corpus.words.words有什麼區別？

有人請詳細說明。由於現在通過NLTK書的第三章難以繼續。

由於一噸

來源

2013-05-31 vinita

這很簡單真的，words是類的實例包含nltk.corpus，相關代碼名稱：

words = LazyCorpusLoader('words', WordListCorpusReader, r'(?!README|\.).*')

這一切的意思是，words是一個實例LazyCorpusLoader。

所以你得到nltk.corpus.words作爲參考。

但是等等！

如果您查看LazyCorpusLoader的代碼，它還會調用LazyCorpusLoader和WordListCorpusReader。

WordListCorpusReader恰好有一個方法叫words，它看起來像這樣：

def words(self, fileids=None): 
    return line_tokenize(self.raw(fileids))

而且LazyCorpusLoader做到這一點corpus = self.__reader_cls(root, *self.__args, **self.__kwargs)

從本質上講，做的是使self.__reader__cls的WordListCorpusReader一個實例（它有自己的字法）。

那麼它這樣做：

self.__dict__ = corpus.__dict__ 
self.__class__ = corpus.__class__

根據Python文檔__dict__ is the module’s namespace as a dictionary object。所以它將命名空間更改爲corpus的命名空間。同樣，對於__class__文檔說__class__ is the instance’s class，所以它也改變了類。因此nltk.corpus.words.words是指包含在名爲words的實例中的實例方法字。那有意義嗎？此代碼說明了相同的行爲：

class Bar(object): 
    def foo(self): 
     return "I am a method of Bar" 

class Foo(object): 
    def __init__(self, newcls): 
     newcls = newcls() 
     self.__class__ = newcls.__class__ 
     self.__dict__ = newcls.__dict__ 

foo = Foo(Bar) 
print foo.foo()

而且，這裏是鏈接到源，所以你可以看到自己：

http://nltk.googlecode.com/svn/trunk/doc/api/nltk.corpus-pysrc.html

http://nltk.googlecode.com/svn/trunk/doc/api/nltk.corpus.reader.wordlist-pysrc.html#WordListCorpusReader

來源

2013-05-31 06:44:08 Wes

無法遵循NLTK語料庫結構

回答

相關問題