2017-03-09 103 views
4

如何組合三個數據幀,如下所示?根據條件合併3個不同的數據幀

前兩個的主要關係必須基於ID1,因爲它是兩個數據框之間的匹配關係。

第三個數據幀,地址2必須以哈希添加

DF1匹配:

Name1 Name2 Name3 Address ID1  ID2 Own 
Matt John1 Jill  878 home 1  0  Deal 
Matt John2 Jack  879 home 2  1  Dael 

DF2:

Name1 ID1 Address Name4  Address2 
Matt 1  878 home face1  face\123 
Matt 1  878 home face2  face\345 
Matt 1  878 home face3  face\678  
Matt 2  879 home head1  head\123 
Matt 2  879 home head2  head\345 
Matt 2  879 home head3  head\678 

DF3:

Address2  Hash 
face\123  abc123 
face\345  cde321 
face\678  efg123 
head\123  123efg 
head\345  efg321 
head\678  acd321 

我正試圖結合3個dataframes成一個象下面這樣:

Name1 Name2 ID1 Address  Own Name3 ID2 Name4 Address2 Hash 
Matt John1 1 878 home Deal Jill 0 face1 face\123 abc123 
Matt John1 1 878 home Deal Jill 0 face2 face\345 cde321 
Matt John1 1 878 home Deal Jill 0 face3 face\678 efg123 
Matt John2 2 879 home Dael Jack 1 head1 head\123 123efg 
Matt John2 2 879 home Dael Jack 1 head2 head\345 efg321 
Matt John2 2 879 home Dael Jack 1 head3 head\678 acd321 

DF1之間和DF2關鍵是到ID1 DF2之間和DF3關鍵是地址2

非常感謝您的幫助。

+1

你不就是在這裏合併列交叉嗎? 'df1.merge(DF2).merge(DF3)'? – miradulo

回答

1

看看merge函數,可以找到一些例子here。針對您的特定問題,請嘗試以下操作:

combined_df = df1.merge(df2, on="Id1", how="inner").merge(df3, on="Adress2", how="inner") 
0

我認爲這會起作用。 Ther合併函數幾乎適合您想要加入的列。

import numpy as np 
import pandas as pd 

data = np.array([['Name1','Name2','Name3','Address','ID1','ID2','Own'], 
       ['Matt','John1','Jill','878 home','1','0','Deal'], 
       ['Matt', 'John2', 'Jack', '879 home', '2', '1', 'Dael']]) 

data2 = np.array([['Name1','ID1','Address','Name4','Address2'], 
       ['Matt', '1','878 home','face1',"face.123"], 
       ['Matt', '1','878 home', 'face2','face.345'], 
        ['Matt', '1','878 home', 'face3', 'face.678'], 
        ['Matt', '2', '879 home', 'head1', 'head.123'], 
        ['Matt', '2', '879 home', 'head2', 'head.345'], 
        ['Matt', '2', '879 home', 'head3', 'head.678']]) 
#print(data) 
data3 = np.array([['Address2','Hash'], 
       ['face.123', 'abc123'], 
       ['face.345','cde321'], 
       ['face.678', 'efg123'], 
       ['head.123', '123efg'], 
       ['head.345', 'efg321'], 
       ['head.678', 'acd321']]) 

df1 = pd.DataFrame(data=data[1:,:], columns=data[0,:]) 
df2 = pd.DataFrame(data=data2[1:,:], columns=data2[0,:]) 
df3 = pd.DataFrame(data=data3[1:,:], columns=data3[0,:]) 


Cdf= pd.merge(df1,df2, on='ID1', how='inner') 
Ddf = pd.merge(Cdf,df3, on = 'Address2', how='inner') 
print(Ddf) 
0

從你期望的輸出,你似乎並不需要任何規範超出列交叉融合是自動進行的。

>>> df1.merge(df2).merge(df3) 

    Name1 Name2 Name3 Address ID1 ID2 Own Name4 Address2 Hash 
0 Matt John1 Jill 878 home 1 0 Deal face1 face\123 abc123 
1 Matt John1 Jill 878 home 1 0 Deal face2 face\345 cde321 
2 Matt John1 Jill 878 home 1 0 Deal face3 face\678 efg123 
3 Matt John2 Jack 879 home 2 1 Dael head1 head\123 123efg 
4 Matt John2 Jack 879 home 2 1 Dael head2 head\345 efg321 
5 Matt John2 Jack 879 home 2 1 Dael head3 head\678 acd321 

指定單數列作爲接受的答案進行合併確實會導致問題,因爲您將有後綴列。

>>> df1.merge(df2, on="ID1", how="inner").merge(df3, on="Address2", how="inner") 

    Name1_x Name2 Name3 Address_x ID1 ID2 Own Name1_y Address_y Name4 \ 
0 Matt John1 Jill 878home 1 0 Deal Matt 878home face1 
1 Matt John1 Jill 878home 1 0 Deal Matt 878home face2 
2 Matt John1 Jill 878home 1 0 Deal Matt 878home face3 
3 Matt John2 Jack 879home 2 1 Dael Matt 879home head1 
4 Matt John2 Jack 879home 2 1 Dael Matt 879home head2 
5 Matt John2 Jack 879home 2 1 Dael Matt 879home head3 

    Address2 Hash 
0 face\123 abc123 
1 face\345 cde321 
2 face\678 efg123 
3 head\123 123efg 
4 head\345 efg321 
5 head\678 acd321