2015-12-02 51 views
1

我想從文本文件導入數據並進行矢量空間表示出來的話:錯誤創建的文本文件,在python矢量時

from sklearn.feature_extraction.text import CountVectorizer 

vectorizer = CountVectorizer(input="file") 
f = open('D:\\test\\17.txt') 
bag_of_words = vectorizer.fit(f) 
bag_of_words = vectorizer.transform(f) 
print(bag_of_words) 

但我得到這個錯誤:

Traceback (most recent call last): 
    File "D:\test\test.py", line 5, in <module> 
    bag_of_words = vectorizer.fit(f) 
File "C:\Anaconda3\lib\site-packages\sklearn\feature_extraction\text.py", line 776, in fit 
self.fit_transform(raw_documents) 
File "C:\Anaconda3\lib\site-packages\sklearn\feature_extraction\text.py", line 804, in fit_transform 
self.fixed_vocabulary_) 
File "C:\Anaconda3\lib\site-packages\sklearn\feature_extraction\text.py", line 739, in _count_vocab 
for feature in analyze(doc): 
File "C:\Anaconda3\lib\site-packages\sklearn\feature_extraction\text.py", line 236, in <lambda> 
tokenize(preprocess(self.decode(doc))), stop_words) 
File "C:\Anaconda3\lib\site-packages\sklearn\feature_extraction\text.py", line 110, in decode 
doc = doc.read() 
AttributeError: 'str' object has no attribute 'read' 

任何想法?

+0

哪一行是錯誤? –

+0

完整錯誤報告的編輯帖子。 – Masyaf

回答

0

vectorizer.fit方法需要一個可迭代的文件或字符串對象(不是單個文件對象),因此您應該有vectorizer.fit([f])

另外,在第二次調用vectorizer.transform時(因爲該文件已被讀取),您不能重複使用f。你可能想要做的是以下幾點:

vectorizer = CountVectorizer(input="file") 
f = open('D:\\test\\17.txt') 
bag_of_words = vectorizer.fit_transform([f])