Python Pandas Dataframe：如何從數據框中的現有列表創建列？

所以我從CSV文件中的數據框大熊貓看起來像這樣：Python Pandas Dataframe：如何從數據框中的現有列表創建列？

year,month,day,list 
2017,09,01,"[('United States of America', 12345), (u'Germany', 54321), (u'Switzerland', 13524), (u'Netherlands', 24135), ... ] 
2017,09,02,"[('United States of America', 6789), (u'Germany', 9876), (u'Switzerland', 6879), (u'Netherlands', 7968), ... ]

國家數對的每一行的第4列中的數字是不相同的。
我想擴大在第4列的列表中，並轉化數據框弄成這個樣子：

year,month,day,country,count 
2017,09,01,'United States of America',12345 
2017,09,01,'Germany',54321 
2017,09,01,'Switzerland',13524 
2017,09,01,'Netherlands',24135 
... 
2017,09,02,'United States of America',6789 
2017,09,02,'Germany',9876 
2017,09,02,'Switzerland',6879 
2017,09,02,'Netherlands',7968 
...

我的想法是，產生2個獨立的列，然後將它們加入到原始數據幀。也許事情是這樣的：

country = df.apply(lambda x:[x['list'][0]]).stack().reset_index(level=1, drop=True) 
count = df.apply(lambda x:[x['list'][1]]).stack().reset_index(level=1, drop=True) 
df.drop('list', axis=1).join(country).join(count)

上面的代碼是絕對不工作（我只希望它可以幫助表達我的想法），我不知道如何擴大日期列也是如此。
任何幫助或建議非常感謝。

來源

2017-10-18 Dan Lwo

解決問題的最簡單方法可能是對包含在數據框中的元組進行迭代，並創建一個新的元組。你可以用兩個嵌套for循環來完成。

df_new = [] 
for i in df.itertuples(): 
    for l in i.list: 
     df_new.append([i.year, i.month, i.day, l[0], l[1]]) 

df_new = pd.DataFrame(df_new, columns=['year', 'month', 'day', 'country', 'count'])

如果列表中的第四場是不是一個實際的列表，但一個字符串（在數據幀例如雙引號離開我有些疑惑），你可以使用literal_eval功能從ast庫：Converting a string representation of a list into an actual list object

來源

2017-10-18 09:04:05 Covix

非常感謝您！我會嘗試這種方式，看看它是否有效。 –

你是對的 - 第四列不是一個實際的列表，而是一個字符串，你的方法確實可以解決日期問題。謝謝！ –

用途：

import ast 
#convert strings to lists of tuples 
df['list'] = df['list'].apply(ast.literal_eval) 
#create reshaped df from column list 
df1 =pd.DataFrame([dict(x) for x in df['list'].values.tolist()]).stack().reset_index(level=1) 
df1.columns = ['country','count'] 
#join to original 
df = df.drop('list', 1).join(df1).reset_index(drop=True) 
print (df) 
    year month day     country count 
0 2017  9 1     Germany 54321 
1 2017  9 1    Netherlands 24135 
2 2017  9 1    Switzerland 13524 
3 2017  9 1 United States of America 12345 
4 2017  9 2     Germany 9876 
5 2017  9 2    Netherlands 7968 
6 2017  9 2    Switzerland 6879 
7 2017  9 2 United States of America 6789

來源

2017-10-18 09:06:16 jezrael

謝謝！我試過了，這正是我需要的。 –

順便說一句，我發現日期有問題，問題可能是與rejoin部分。如果我找到如何更正它，我會更新。 –

所以，你需要的是與值列表分成多個行cconvert列。一種解決方案是創建一個新的數據幀，並做了左join：

df = pd.DataFrame({'A':['a','b'],'B':['x','y'], 
        'C':[['a1', 'a2'],['b1', 'b2', 'b3']]}) 

df 
# A B    C 
# 0 a x  [[a1, a2]] 
# 1 b y [[b1, b2, b3]] 

dfr=df['C'].apply(lambda k: pd.Series(k)).stack().reset_index(level=1, drop=True).to_frame('C') 

dfr 
#  C 
# 0 a1 
# 0 a2 
# 1 b1 
# 1 b2 
# 1 b3 

df[['A','B']].join(dfr, how='left') 
# A B C 
# 0 a x a1 
# 0 a x a2 
# 1 b y b1 
# 1 b y b2 
# 1 b y b3

最後，使用reset_index()

df[['A','B']].join(dfr, how='left').reset_index(drop=1) 
# A B C 
# 0 a x a1 
# 1 a x a2 
# 2 b y b1 
# 3 b y b2 
# 4 b y b3

信用：https://stackoverflow.com/a/39955283/2314737

來源

2017-10-18 09:09:13 user2314737

謝謝！我也會嘗試這種方式。 –

Python Pandas Dataframe：如何從數據框中的現有列表創建列？

回答

相關問題