CountVectorizer（）：StreamBackedCorpusView」對象有沒有屬性‘低’

我試圖在NLTK電影運行和實例CountVectorizer（）評論文集，使用下面的代碼：CountVectorizer（）：StreamBackedCorpusView」對象有沒有屬性‘低’

>>>import nltk 
>>>import nltk.corpus 
>>>from sklearn.feature_extraction.text import CountVectorizer 
>>>from nltk.corpus import movie_reviews 
>>>neg_rev = movie_reviews.fileids('neg') 
>>>pos_rev = movie_reviews.fileids('pos') 
>>>rev_list = [] # Empty List 
>>>for rev in neg_rev: 
    rev_list.append(nltk.corpus.movie_reviews.words(rev)) 
>>>for rev_pos in pos_rev: 
    rev_list.append(nltk.corpus.movie_reviews.words(rev_pos)) 
>>>count_vect = CountVectorizer() 
>>>X_count_vect = count_vect.fit_transform(rev_list)

我收到以下錯誤：

AttributeError       Traceback (most recent call last) 
<ipython-input-37-00e9047daa67> in <module>() 
----> 1 X_count_vect = count_vect.fit_transform(rev_list) 

C:\ProgramData\Anaconda3\lib\site-packages\sklearn\feature_extraction\text.py in fit_transform(self, raw_documents, y) 
    837 
    838   vocabulary, X = self._count_vocab(raw_documents, 
--> 839           self.fixed_vocabulary_) 
    840 
    841   if self.binary: 

C:\ProgramData\Anaconda3\lib\site-packages\sklearn\feature_extraction\text.py in _count_vocab(self, raw_documents, fixed_vocab) 
    760   for doc in raw_documents: 
    761    feature_counter = {} 
--> 762    for feature in analyze(doc): 
    763     try: 
    764      feature_idx = vocabulary[feature] 

C:\ProgramData\Anaconda3\lib\site-packages\sklearn\feature_extraction\text.py in <lambda>(doc) 
    239 
    240    return lambda doc: self._word_ngrams(
--> 241     tokenize(preprocess(self.decode(doc))), stop_words) 
    242 
    243   else: 

C:\ProgramData\Anaconda3\lib\site-packages\sklearn\feature_extraction\text.py in <lambda>(x) 
    205 
    206   if self.lowercase: 
--> 207    return lambda x: strip_accents(x.lower()) 
    208   else: 
    209    return strip_accents 

AttributeError: 'StreamBackedCorpusView' object has no attribute 'lower'

nltk.corpus.movie_reviews.words(rev_pos)已標記化的句子....如：

['films', 'adapted', 'from', 'comic', 'books', 'have', ...]

任何人都可以請告訴我我做錯了什麼？我假設我在創建電影評論的(rev_list)列表中進行了一些嘗試。

TIA

來源

2017-09-04 chhibbz

您應該檢查類型'nltk.corpus.movie_reviews.words（rev_pos）'你是追加到列表中。它應該是一個由CountVectorizer處理的字符串，我不認爲它是當前的。 –

它看起來像你的.words（）函數實際上不是給你回令牌的列表，而是一系列StreamBackedCorpusView類。該類允許您檢索令牌，但實際上並不是令牌本身的完整表示。

但是，您可以從視圖中檢索令牌。有關使用StreamBackCorpusView的更多詳細信息，請參閱以下鏈接。

http://nltk.sourceforge.net/corpusview/corpusview.StreamBackedCorpusView-class.html

來源

2017-09-04 15:36:58

CountVectorizer（）：StreamBackedCorpusView」對象有沒有屬性‘低’

回答

相關問題