2016-10-05 90 views
1

即時通訊做與情感分析scikit學習蟒蛇,現在我使用的NLTK做的話詞形歸併,以提高處理速度,例如:情感分析聯合列表

我得到以下陣列後NLTK處理:

array([ ['Really', 'a', 'terrible', 'course', u'lecture', u'be', 'so', 'boring', 'i', u'contemplate', 'suicide', 'on', 'numerous', u'occasion', 'and', 'the', 'tutes', u'go', 'for', 'two', u'hour', 'and', u'be', 'completely'], ['Management', 'accounting', u'require', 'sufficient', 'practice', 'to', 'get', 'a', 'hang', 'of', 'Made', 'easier', 'with', 'a', 'great', 'lecturer']], dtype=object) 

但scklearn要求陣列

array([ 'Really a terrible course lectures were so boring i contemplated suicide on numerous occasions and the tutes went for two hours and were completely ', 'Management accounting requires sufficient practice to get a hang of Made easier with a great lecturer '],dtype=object) 

那麼什麼是這個數組轉換成合適的形狀最好方法是什麼?我嘗試使用joint list但結果卻是陌生的

回答

0

你會做:

second_array = [' '.join(each) for each in first_array] 

或者你可以告訴sklearn.CountVectorizer只使用您的令牌:

vect = CountVectorizer(preprocessor=lambda x: x, tokenizer=lambda x: x) 
X = vect.fit_transform(first_array) 
+0

精彩!非常感謝 –