根據條件合併3個不同的數據幀

前兩個的主要關係必須基於ID1，因爲它是兩個數據框之間的匹配關係。

第三個數據幀，地址2必須以哈希添加

DF1匹配：

Name1 Name2 Name3 Address ID1  ID2 Own 
Matt John1 Jill  878 home 1  0  Deal 
Matt John2 Jack  879 home 2  1  Dael

DF2：

Name1 ID1 Address Name4  Address2 
Matt 1  878 home face1  face\123 
Matt 1  878 home face2  face\345 
Matt 1  878 home face3  face\678  
Matt 2  879 home head1  head\123 
Matt 2  879 home head2  head\345 
Matt 2  879 home head3  head\678

DF3：

Address2  Hash 
face\123  abc123 
face\345  cde321 
face\678  efg123 
head\123  123efg 
head\345  efg321 
head\678  acd321

我正試圖結合3個dataframes成一個象下面這樣：

Name1 Name2 ID1 Address  Own Name3 ID2 Name4 Address2 Hash 
Matt John1 1 878 home Deal Jill 0 face1 face\123 abc123 
Matt John1 1 878 home Deal Jill 0 face2 face\345 cde321 
Matt John1 1 878 home Deal Jill 0 face3 face\678 efg123 
Matt John2 2 879 home Dael Jack 1 head1 head\123 123efg 
Matt John2 2 879 home Dael Jack 1 head2 head\345 efg321 
Matt John2 2 879 home Dael Jack 1 head3 head\678 acd321

DF1之間和DF2關鍵是到ID1 DF2之間和DF3關鍵是地址2

非常感謝您的幫助。

來源

2017-03-09 johnnyb

你不就是在這裏合併列交叉嗎？ 'df1.merge（DF2）.merge（DF3）'？ – miradulo

看看merge函數，可以找到一些例子here。針對您的特定問題，請嘗試以下操作：

combined_df = df1.merge(df2, on="Id1", how="inner").merge(df3, on="Adress2", how="inner")

來源

2017-03-09 17:38:36 StefP

我認爲這會起作用。 Ther合併函數幾乎適合您想要加入的列。

import numpy as np 
import pandas as pd 

data = np.array([['Name1','Name2','Name3','Address','ID1','ID2','Own'], 
       ['Matt','John1','Jill','878 home','1','0','Deal'], 
       ['Matt', 'John2', 'Jack', '879 home', '2', '1', 'Dael']]) 

data2 = np.array([['Name1','ID1','Address','Name4','Address2'], 
       ['Matt', '1','878 home','face1',"face.123"], 
       ['Matt', '1','878 home', 'face2','face.345'], 
        ['Matt', '1','878 home', 'face3', 'face.678'], 
        ['Matt', '2', '879 home', 'head1', 'head.123'], 
        ['Matt', '2', '879 home', 'head2', 'head.345'], 
        ['Matt', '2', '879 home', 'head3', 'head.678']]) 
#print(data) 
data3 = np.array([['Address2','Hash'], 
       ['face.123', 'abc123'], 
       ['face.345','cde321'], 
       ['face.678', 'efg123'], 
       ['head.123', '123efg'], 
       ['head.345', 'efg321'], 
       ['head.678', 'acd321']]) 

df1 = pd.DataFrame(data=data[1:,:], columns=data[0,:]) 
df2 = pd.DataFrame(data=data2[1:,:], columns=data2[0,:]) 
df3 = pd.DataFrame(data=data3[1:,:], columns=data3[0,:]) 


Cdf= pd.merge(df1,df2, on='ID1', how='inner') 
Ddf = pd.merge(Cdf,df3, on = 'Address2', how='inner') 
print(Ddf)

來源

2017-03-09 17:56:55 Cesar

從你期望的輸出，你似乎並不需要任何規範超出列交叉融合是自動進行的。

>>> df1.merge(df2).merge(df3) 

    Name1 Name2 Name3 Address ID1 ID2 Own Name4 Address2 Hash 
0 Matt John1 Jill 878 home 1 0 Deal face1 face\123 abc123 
1 Matt John1 Jill 878 home 1 0 Deal face2 face\345 cde321 
2 Matt John1 Jill 878 home 1 0 Deal face3 face\678 efg123 
3 Matt John2 Jack 879 home 2 1 Dael head1 head\123 123efg 
4 Matt John2 Jack 879 home 2 1 Dael head2 head\345 efg321 
5 Matt John2 Jack 879 home 2 1 Dael head3 head\678 acd321

指定單數列作爲接受的答案進行合併確實會導致問題，因爲您將有後綴列。

>>> df1.merge(df2, on="ID1", how="inner").merge(df3, on="Address2", how="inner") 

    Name1_x Name2 Name3 Address_x ID1 ID2 Own Name1_y Address_y Name4 \ 
0 Matt John1 Jill 878home 1 0 Deal Matt 878home face1 
1 Matt John1 Jill 878home 1 0 Deal Matt 878home face2 
2 Matt John1 Jill 878home 1 0 Deal Matt 878home face3 
3 Matt John2 Jack 879home 2 1 Dael Matt 879home head1 
4 Matt John2 Jack 879home 2 1 Dael Matt 879home head2 
5 Matt John2 Jack 879home 2 1 Dael Matt 879home head3 

    Address2 Hash 
0 face\123 abc123 
1 face\345 cde321 
2 face\678 efg123 
3 head\123 123efg 
4 head\345 efg321 
5 head\678 acd321

來源

2017-03-09 18:12:23 miradulo

根據條件合併3個不同的數據幀

回答

相關問題