遞歸錯誤：最大遞歸深度超過

from __future__ import print_function 
import os, codecs, nltk.stem 

english_stemmer = nltk.stem.SnowballStemmer('english') 
for root, dirs, files in os.walk("/Users/Documents/corpus/source-document/test1"): 
     for file in files: 
      if file.endswith(".txt"): 
       posts = codecs.open(os.path.join(root,file),"r", "utf-8-sig") 
from sklearn.feature_extraction.text import CountVectorizer 
class StemmedCountVectorizer(CountVectorizer): 
    def build_analyzer(self): 
     analyzer = super(StemmedCountVectorizer, self.build_analyzer()) 
     return lambda doc: (english_stemmer.stem(w) for w in analyzer(doc)) 

vectorizer = StemmedCountVectorizer(min_df = 1, stop_words = 'english') 
X_train = vectorizer.fit_transform(posts) 
num_samples, num_features = X_train.shape 
print("#samples: %d, #features: %d" % (num_samples, num_features))  #samples: 5, #features: 25 
print(vectorizer.get_feature_names())

當我包含在它拋出以下錯誤的目錄中的所有文本文件，運行上面的代碼： RecursionError：最大遞歸深度超出。遞歸錯誤：最大遞歸深度超過

我試圖用sys.setrecursionlimit來解決問題，但都是徒勞的。當我提供像20000這樣的大值時，發生內核崩潰錯誤。

來源

2016-08-04 An student

嘗試用'超（StemmedCountVectorizer，個體經營）.build_analyzer（）' –

由於更換'超（StemmedCountVectorizer，self.build_analyzer（））'..這對我的作品 –

什麼是開點像那樣的文件？如果有不止一個，你最終打開他們所有的人，只有最後一個打開工作。將該函數作爲打開文件的返回函數或添加一些中斷，或者如果您想要處理多個文件，將它們添加到列表中或直接打開該文件（如果知道它在哪裏） – Copperfield

你的錯誤是在analyzer = super(StemmedCountVectorizer, self.build_analyzer())這裏你在超級調用之前調用函數build_analyzer，這會導致無限遞歸循環。更改它analyzer = super(StemmedCountVectorizer, self).build_analyzer()

來源

2016-08-04 13:11:12 Copperfield

遞歸錯誤：最大遞歸深度超過

回答

相關問題