創建文檔術語矩陣時出現屬性錯誤

-1

我正在嘗試創建以熊貓數據框的形式表示的文檔術語矩陣。這是我到目前爲止的代碼：創建文檔術語矩陣時出現屬性錯誤

df_profession['Athlete_Clean'] = df_profession['Athlete Biographies'].str.lower() 
df_profession['Athlete_Clean'] = df_profession['Athlete_Clean'].apply(lambda x: ''.join([i for i in x if not i.isdigit()])) 
df_profession['Athlete_Clean'] = df_profession['Athlete_Clean'].str.split() 
df_profession['Athlete_Clean'] = [word for word in df_profession['Athlete_Clean'] if word not in punctuation] 
df_profession['Athlete_Clean'] = [word for word in df_profession['Athlete_Clean'] if word not in stopwords.words('english')] 

profession_dtm_athlete = pandas.DataFrame(countvec.fit_transform(df_profession['Athlete_Clean']).toarray(), columns=countvec.get_feature_names(), index = df.index) 
profession_dtm_athlete

當我運行這段代碼，我得到了以下錯誤：

'list' object has no attribute 'lower'

我怎樣才能擺脫這種錯誤的？

來源

2017-04-16 Jberk

包裹列表STR（）對象將它們轉換爲字符串：

df_profession['Athlete_Clean'] = str(df_profession['Athlete Biographies']).lower() 
df_profession['Athlete_Clean'] = df_profession['Athlete_Clean'].apply(lambda x: ''.join([i for i in x if not i.isdigit()])) 
df_profession['Athlete_Clean'] = str(df_profession['Athlete_Clean']).split() 
df_profession['Athlete_Clean'] = [word for word in df_profession['Athlete_Clean'] if word not in punctuation] 
df_profession['Athlete_Clean'] = [word for word in df_profession['Athlete_Clean'] if word not in stopwords.words('english')] 

profession_dtm_athlete = pandas.DataFrame(countvec.fit_transform(df_profession['Athlete_Clean']).toarray(), columns=countvec.get_feature_names(), index = df.index) 
profession_dtm_athlete

來源

2017-04-16 20:14:02 JacobIRR

所以這似乎已經超過了問題，但現在我得到「ValueError異常：值的長度不符合的長度索引「的任何建議，爲什麼這是出現？ – Jberk

這個錯誤是熊貓圖書館內部的，所以我不確定。這可能值得一個新的問題。如果你確實把它作爲一個新問題，我建議使用dataframe標籤。 – JacobIRR

好的，謝謝JacobIRR。我會繼續並就這個新錯誤創建一個新問題。 – Jberk

創建文檔術語矩陣時出現屬性錯誤

回答

相關問題