2017-04-16 111 views
-1

我正在嘗試創建以熊貓數據框的形式表示的文檔術語矩陣。這是我到目前爲止的代碼:創建文檔術語矩陣時出現屬性錯誤

df_profession['Athlete_Clean'] = df_profession['Athlete Biographies'].str.lower() 
df_profession['Athlete_Clean'] = df_profession['Athlete_Clean'].apply(lambda x: ''.join([i for i in x if not i.isdigit()])) 
df_profession['Athlete_Clean'] = df_profession['Athlete_Clean'].str.split() 
df_profession['Athlete_Clean'] = [word for word in df_profession['Athlete_Clean'] if word not in punctuation] 
df_profession['Athlete_Clean'] = [word for word in df_profession['Athlete_Clean'] if word not in stopwords.words('english')] 

profession_dtm_athlete = pandas.DataFrame(countvec.fit_transform(df_profession['Athlete_Clean']).toarray(), columns=countvec.get_feature_names(), index = df.index) 
profession_dtm_athlete 

當我運行這段代碼,我得到了以下錯誤:

'list' object has no attribute 'lower' 

我怎樣才能擺脫這種錯誤的?

回答

0

包裹列表STR()對象將它們轉換爲字符串:

df_profession['Athlete_Clean'] = str(df_profession['Athlete Biographies']).lower() 
df_profession['Athlete_Clean'] = df_profession['Athlete_Clean'].apply(lambda x: ''.join([i for i in x if not i.isdigit()])) 
df_profession['Athlete_Clean'] = str(df_profession['Athlete_Clean']).split() 
df_profession['Athlete_Clean'] = [word for word in df_profession['Athlete_Clean'] if word not in punctuation] 
df_profession['Athlete_Clean'] = [word for word in df_profession['Athlete_Clean'] if word not in stopwords.words('english')] 

profession_dtm_athlete = pandas.DataFrame(countvec.fit_transform(df_profession['Athlete_Clean']).toarray(), columns=countvec.get_feature_names(), index = df.index) 
profession_dtm_athlete 
+0

所以這似乎已經超過了問題,但現在我得到「ValueError異常:值的長度不符合的長度索引「的任何建議,爲什麼這是出現? – Jberk

+0

這個錯誤是熊貓圖書館內部的,所以我不確定。這可能值得一個新的問題。如果你確實把它作爲一個新問題,我建議使用dataframe標籤。 – JacobIRR

+0

好的,謝謝JacobIRR。我會繼續並就這個新錯誤創建一個新問題。 – Jberk