錯誤創建的文本文件，在python矢量時

我想從文本文件導入數據並進行矢量空間表示出來的話：錯誤創建的文本文件，在python矢量時

from sklearn.feature_extraction.text import CountVectorizer 

vectorizer = CountVectorizer(input="file") 
f = open('D:\\test\\17.txt') 
bag_of_words = vectorizer.fit(f) 
bag_of_words = vectorizer.transform(f) 
print(bag_of_words)

但我得到這個錯誤：

Traceback (most recent call last): 
    File "D:\test\test.py", line 5, in <module> 
    bag_of_words = vectorizer.fit(f) 
File "C:\Anaconda3\lib\site-packages\sklearn\feature_extraction\text.py", line 776, in fit 
self.fit_transform(raw_documents) 
File "C:\Anaconda3\lib\site-packages\sklearn\feature_extraction\text.py", line 804, in fit_transform 
self.fixed_vocabulary_) 
File "C:\Anaconda3\lib\site-packages\sklearn\feature_extraction\text.py", line 739, in _count_vocab 
for feature in analyze(doc): 
File "C:\Anaconda3\lib\site-packages\sklearn\feature_extraction\text.py", line 236, in <lambda> 
tokenize(preprocess(self.decode(doc))), stop_words) 
File "C:\Anaconda3\lib\site-packages\sklearn\feature_extraction\text.py", line 110, in decode 
doc = doc.read() 
AttributeError: 'str' object has no attribute 'read'

任何想法？

來源

2015-12-02 Masyaf

哪一行是錯誤？ –

完整錯誤報告的編輯帖子。 – Masyaf

vectorizer.fit方法需要一個可迭代的文件或字符串對象（不是單個文件對象），因此您應該有vectorizer.fit([f])。

另外，在第二次調用vectorizer.transform時（因爲該文件已被讀取），您不能重複使用f。你可能想要做的是以下幾點：

vectorizer = CountVectorizer(input="file") 
f = open('D:\\test\\17.txt') 
bag_of_words = vectorizer.fit_transform([f])

來源

2015-12-03 17:46:58

錯誤創建的文本文件，在python矢量時

回答

相關問題