python pandas dataframe查找包含特定值並返回的行布爾

我想比較兩個數據幀，即df1和df2。 df1是一個數據，每小時更新一次。 df2是存在的數據幀。我想追加更新的特定行。python pandas dataframe查找包含特定值並返回的行布爾

例如，這裏是DF1

DF1：

fd1

含有5行的其中已經存在信息

和DF2

DF2：

df2

我們可以告訴大家，埃裏克加入，但DF2沒有表示。

我可能會覆蓋DF2與DF1，但我不應該因爲將有句話將由人被更新後的數據被寫入。

所以，我決定通過其ID從DF2發現它刪除數據的各行，並與循環

，並在這之後，會出現刪除它們只Eric的行可以保留，這將讓我有可能只是將eric附加到df2。

所以，我想，這是什麼

for index, row in df1.iterrows(): 
    id = row['id'] 
    if df2.loc[df1['id'].isin(id)] = True: 
     df1[df1.id != id)

並返回語法錯誤....

我是在正確的軌道上？這是解決這個問題的最佳解決方案嗎？我應該如何改變代碼來實現我的目標？

來源

2017-10-09 Taewoo.Lim

您是否在尋找'pd.concat（[DF2，DF1 [〜df1.Id.isin（df2.Id ）]]，axis = 0） '？ – Wen

要解決你的代碼...

l=[] 
for index, row in df1.iterrows(): 
    id = row['Id'] 
    if sum(df2['Id'].isin([id]))>0: 
     l.append(id) 
l 
Out[334]: [0, 1, 2, 3, 4] # those are the row you need to remove 

df1.loc[~df1.index.isin(l)]# you remove them by using `~` + .isin 
Out[339]: 
    Id Name 
5 5 F 
6 6 G

通過使用pd.concat

pd.concat([df2,df1[~df1.Id.isin(df2.Id)]],axis=0) 
Out[337]: 
    Id Name 
0 0 A 
1 1 B 
2 2 C 
3 3 D 
4 4 E 
5 5 F 
6 6 G

數據輸入

fake = {'Id' : [0,1,2,3,4,5,6], 
     'Name' : ['A','B','C','D','E','F','G']} 
df1 = pd.DataFrame(fake) 

fake = {'Id' : [0,1,2,3,4], 
     'Name' : ['A','B','C','D','E']} 
df2 = pd.DataFrame(fake)

來源

2017-10-09 04:37:04 Wen

大熊貓有幾個可用的功能，允許合併和加入不同DataFrames。一，你可以在這裏用的是merge：https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.merge.html

>>>merged = df1.merge(df2, how='left') 
    id name remark 
0 234 james  
1 212 steve  
2 153 jack smart 
3 567 ted  
4 432 eric NaN 
5 543 bob

如果你不想插入值是NaN，你總是可以使用fillna：https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.fillna.html。

來源

2017-10-09 04:18:59 AetherUnbound

讓我們假設'steve'有，我們要在df1保留了一句話和'jack'了我們想在df2保存的話。我們可以設置每個數據幀的指數來['id', 'name']和使用pd.Series.combine_first

設置

df1 = pd.DataFrame(dict(
    id=[12, 34, 56, 78, 90, 13], 
    name='james steve jack ted eric bob'.split(), 
    remark='', 
)) 
df1.at[1, 'remark'] = 'meh' 

df2 = pd.DataFrame(dict(
    id=[12, 34, 56, 78, 13], 
    name='james steve jack ted bob'.split(), 
    remark='', 
)) 
df2.at[2, 'remark'] = 'smart'

解決方案

s1 = df1.set_index(['id', 'name']).remark 
s2 = df2.set_index(['id', 'name']).remark 

s1.mask(s1.eq('')).combine_first(s2.mask(s2.eq(''))).fillna('').reset_index() 

    id name remark 
0 12 james  
1 13 bob  
2 34 steve meh 
3 56 jack smart 
4 78 ted  
5 90 eric

然而，supposin它完全如同OP介紹的那樣！

設置

df1 = pd.DataFrame(dict(
    id=[12, 34, 56, 78, 90, 13], 
    name='james steve jack ted eric bob'.split(), 
    remark='', 
)) 

df2 = pd.DataFrame(dict(
    id=[12, 34, 56, 78, 13], 
    name='james steve jack ted bob'.split(), 
    remark='', 
)) 
df2.at[2, 'remark'] = 'smart'

解決方案

df2.append(df1).drop_duplicates(['id', 'name']).reset_index(drop=True) 

    id name remark 
0 12 james  
1 34 steve  
2 56 jack smart 
3 78 ted  
4 13 bob  
5 90 eric

來源

2017-10-09 04:49:41 piRSquared

python pandas dataframe查找包含特定值並返回的行布爾

回答

相關問題