2017-10-09 23 views
1

我有3個df。從同一個項目熊貓獲取不同數據框的值

DF1

id val1 val2  
1 1.1  2.2 
2 3.3  6.6 

DF2

id val1 val2  
1 5.1  2.2 
3 3.3  6.6 
4 2.1  5.2 

DF3

id val1 val2  
1 9.1  3.2 
4 8.1  3.2 
5 1.3  4.5 

您可以爲同一通知= 1,3,4在不同的數據幀中有不同的值val1 & val2

我所尋找的是,對於這種多occurances最終的DF與DF爲一列從每個值:

id df1   df2   df3 
1 [1.1,2.2] [5.1,2.2] [9.1,3.2] 
4 [2.1,5.2] [8.1,3.2] NA 

是在想:

df.groupby(['id']).apply(list) 

這是可能的熊貓嗎?

回答

2

用途:

#list of all DataFrames 
dfs = [df1, df2, df3] 

#loop for set index and Series by constructor 
L = [] 
for x in dfs: 
    x = x.set_index('id') 
    L.append(pd.Series(x.values.tolist(), index=x.index)) 

#all together 
df = pd.concat(L, axis=1, keys=('df1','df2','df3')) 
print (df) 
      df1   df2   df3 
id          
1 [1.1, 2.2] [5.1, 2.2] [9.1, 3.2] 
2 [3.3, 6.6]   NaN   NaN 
3   NaN [3.3, 6.6]   NaN 
4   NaN [2.1, 5.2] [8.1, 3.2] 
5   NaN   NaN [1.3, 4.5] 

#filter rows 
df = df[df.count(axis=1) > 1] 
print (df) 
      df1   df2   df3 
id          
1 [1.1, 2.2] [5.1, 2.2] [9.1, 3.2] 
4   NaN [2.1, 5.2] [8.1, 3.2] 

謝謝Arthur Gouveia的想法使用dropna

df = df.dropna(thresh=2) 
print (df) 
      df1   df2   df3 
id          
1 [1.1, 2.2] [5.1, 2.2] [9.1, 3.2] 
4   NaN [2.1, 5.2] [8.1, 3.2] 

編輯:在id列解

如果不是唯一的價值是簡化:

print (df3) 
    id val1 val2 
0 1 9.1 3.2 
1 4 8.1 3.2 
2 1 1.3 4.5 <-change value to 1 

dfs = [df1, df2, df3] 
L = [x.groupby('id')['val1','val2'].apply(lambda x: x.values.ravel().tolist()) for x in dfs] 
df = pd.concat(L, axis=1, keys=('df1','df2','df3')) 
df = df[df.count(axis=1) > 1] 
print (df) 
      df1   df2     df3 
id            
1 [1.1, 2.2] [5.1, 2.2] [9.1, 3.2, 1.3, 4.5] 
4   NaN [2.1, 5.2]   [8.1, 3.2] 
+1

要過濾的行我建議使用'DF = df.dropna(THRESH = 2)' –

+0

超級想法,它被添加到溶液中。 – jezrael

+0

@jezrael我試圖在另一個數據集與7個數據框,相應地改變了鍵,但得到這個錯誤。 'InvalidIndexError:Reindexing只在連接 – Shubham

0
df1['df1'] = list(df1[['val1', 'val2']].values) 
df2['df2'] = list(df2[['val1', 'val2']].values) 
df3['df3'] = list(df3[['val1', 'val2']].values) 

df_result = pd.merge(pd.merge(df1[['id', 'df1']], df2[['id', 'df2']], on = 'id', how = 'outer'), df3[['id', 'df3']], on = 'id', how = 'outer') 
相關問題