對大熊貓數據幀的數據添加列的字典裏面

我有一個熊貓數據幀p_df這樣對大熊貓數據幀的數據添加列的字典裏面

 date_loc  timestamp 
id                  
1  2017-05-29 1496083649 
2  2017-05-29 1496089320 
3  2017-05-29 1496095148 
4  2017-05-30 1496100936 
...

像這樣一個

observations = { 
    '1496089320': { 
     'col_a: 'value_a', 
     'col_b: 'value_b', 
     'col_c: 'n/a' 
    }, 
    '1496100936' : { 
     'col_b: 'value_b' 
    }, 
    ... 
}

的字典，我想添加的所有當字典中的鍵也存在於timestamp列中時，observations子字典中包含的值與它們各自的鍵作爲列名，使得得到的數據幀爲

 date_loc  timestamp  col_a col_b col_c 
id                  
1  2017-05-29 1496083649 
2  2017-05-29 1496089320 value_a value_b  n/a 
3  2017-05-29 1496095148 
4  2017-05-30 1496100936   value_b 
...

我試過幾種方法（agg(),apply(),iterrows()），但沒有任何工作。下面是比如我的最後一次嘗試

p_df['col_a'] = '' 
p_df['col_b'] = '' 
p_df['col_c'] = '' 

for index, row in p_df.iterrows(): 
    ts = p_df.loc[index, 'timestamp'] 
    if ts in observations: 
     # how to concat column values in this row? 
    # end if 
#end for

可能我覺得也有比迭代數據幀的行一個更好的辦法，所以我開到比這更好的選擇。

來源

2017-05-29 fcalderan

您可以從字典中構建一個數據幀，然後用在timestamp列中的原始數據幀合併：

import pandas as pd 
# make sure the timestamp columns are of the same type 
df.timestamp = df.timestamp.astype(str) 
 
df.merge(pd.DataFrame.from_dict(observations, 'index'), 
     left_on='timestamp', right_index=True, how='left').fillna('') 

#  date_loc timestamp col_b col_c col_a 
#id     
#1 2017-05-29 1496083649   
#2 2017-05-29 1496089320 value_b n/a value_a 
#3 2017-05-29 1496095148   
#4 2017-05-30 1496100936 value_b

來源

2017-05-29 16:00:36 Psidom

它幾乎工作，謝謝你，但1）'fillna（）'我有這個錯誤：'提高AssertionError（「在blk ref_locs中的差距」）'，沒有它的作品：2）在我的字典中，我有很多鍵不包含在數據框內，所以合併給我很多空行 – fcalderan

對不起，沒有仔細閱讀你的問題。看起來你需要一個左側而不是完全加入;不知道有關'fillna（）'問題。我以前沒有遇到'fillna'的錯誤。 – Psidom

謝謝，左連接工作正常。 – fcalderan

對大熊貓數據幀的數據添加列的字典裏面

回答

相關問題