樣品數據框：

df=pd.DataFrame({"Hashtags" : ["[u'AAPHealthCare4All']", "[]", "[u'NDTV']", "[u'CBI', u'PrannoyRoy', u'Delhi', u'Emergency']" , "[u'CBI']" ]})

樣本輸出

({"Hashtags" : ["#AAPHealthCare4All", " ", "NDTV", "CBI", "PrannoyRoy", "Delhi", "Emergency", "CBI"]})

這是我的代碼：

# Splitting Hashtags 
import pandas as pd 
df = pd.read_csv("2.csv") 
df1 = df.drop('Hashtags', axis=1).join(
      df.Hashtags 
      .str 
      .split(expand=True) 
      .stack() 
      .reset_index(drop=True, level=1) 
      .rename('Hashtags')   
      ) 
df1.to_csv('string_HT.csv', index=False) 
# Cleaning HASHTAGS 
for index,row in df1.iterrows(): 
    df1['Hashtags'] =df1['Hashtags'].str.strip("u' ',") 

for index,row in df1.iterrows(): 
    df1['Hashtags'] = df1['Hashtags'].str.strip("',") 

for index,row in df1.iterrows(): 
    df1['Hashtags'] = df1['Hashtags'].str.strip("u'") 


df1['Hashtags'] = "#" + df1['Hashtags'] 
df1.rename(columns={'Favorite_Count' : 'Favorite Count','Retweet_Count' :'Retweet Count', 'User_Mentions':'User Mentions','User_Location' : 'User Location','No_of_Followers': 'No of Followers','Status_Count':'Status Count','Geo_Enabled':'Geo Enabled','Compound_Score':'Compound Score'}, inplace=True) #Rename column names to suit tableau file 
df1.to_csv('string_HT.csv', index=False)

這就是我想要達到

我想在清除它並刪除不必要的括號，字符和引號/逗號之後，在列中的每個hashtags字符串前添加'＃'。我在數據清理和操作的整個代碼中執行了很多操作，它指向這個錯誤。類型錯誤：ufunc '加' 不包含與環簽名匹配類型D型（ 'S32'）D型（ 'S32'）D型（ 'S32'）

錯誤

File "C:/../filename.py", line 469, in <module> 
    df1['Hashtags'] = "#" + df1['Hashtags'] 

    File "C:\ANACONDA\lib\site-packages\pandas\core\ops.py", line 715, in wrapper 
    result = wrap_results(safe_na_op(lvalues, rvalues)) 

    File "C:\ANACONDA\lib\site-packages\pandas\core\ops.py", line 676, in safe_na_op 
    return na_op(lvalues, rvalues) 

    File "C:\ANACONDA\lib\site-packages\pandas\core\ops.py", line 662, in na_op 
    result[mask] = op(x[mask], y) 

    File "C:\ANACONDA\lib\site-packages\pandas\core\ops.py", line 70, in <lambda> 
    radd=arith_method(lambda x, y: y + x, names('radd'), op('+'), 

TypeError: ufunc 'add' did not contain a loop with signature matching types dtype('S32') dtype('S32') dtype('S32')

來源

2017-06-12 lightyagami96

我認爲最好不使用iterrows循環，如果存在更快的向量化的解決方案。

也許有助於取代：

for index,row in df1.iterrows(): 
    df1['Hashtags'] =df1['Hashtags'].str.strip("u' ',") 

for index,row in df1.iterrows(): 
    df1['Hashtags'] = df1['Hashtags'].str.strip("',") 

for index,row in df1.iterrows(): 
    df1['Hashtags'] = df1['Hashtags'].str.strip("u'")

翻番str.strip - 先刪除字符u,和第二'：

df1['Hashtags'] = df1['Hashtags'].str.strip("[u, ]").str.strip("'") 
df1['Hashtags'] = "#" + df1['Hashtags']

或者添加astype：

df1['Hashtags'] = "#" + df1['Hashtags'].astype(str)

來源

2017-06-12 07:18:54 jezrael

類型錯誤：ufunc '加' 不包含與環簽名匹配類型D型（ 'S32'）D型（ 'S32'）D型（ 'S32'）

樣品數據框：

樣本輸出

這是我的代碼：

這就是我想要達到

錯誤

回答

相關問題