用途:
#list of all DataFrames
dfs = [df1, df2, df3]
#loop for set index and Series by constructor
L = []
for x in dfs:
x = x.set_index('id')
L.append(pd.Series(x.values.tolist(), index=x.index))
#all together
df = pd.concat(L, axis=1, keys=('df1','df2','df3'))
print (df)
df1 df2 df3
id
1 [1.1, 2.2] [5.1, 2.2] [9.1, 3.2]
2 [3.3, 6.6] NaN NaN
3 NaN [3.3, 6.6] NaN
4 NaN [2.1, 5.2] [8.1, 3.2]
5 NaN NaN [1.3, 4.5]
#filter rows
df = df[df.count(axis=1) > 1]
print (df)
df1 df2 df3
id
1 [1.1, 2.2] [5.1, 2.2] [9.1, 3.2]
4 NaN [2.1, 5.2] [8.1, 3.2]
謝謝Arthur Gouveia的想法使用dropna
:
df = df.dropna(thresh=2)
print (df)
df1 df2 df3
id
1 [1.1, 2.2] [5.1, 2.2] [9.1, 3.2]
4 NaN [2.1, 5.2] [8.1, 3.2]
編輯:在id
列解
如果不是唯一的價值是簡化:
print (df3)
id val1 val2
0 1 9.1 3.2
1 4 8.1 3.2
2 1 1.3 4.5 <-change value to 1
dfs = [df1, df2, df3]
L = [x.groupby('id')['val1','val2'].apply(lambda x: x.values.ravel().tolist()) for x in dfs]
df = pd.concat(L, axis=1, keys=('df1','df2','df3'))
df = df[df.count(axis=1) > 1]
print (df)
df1 df2 df3
id
1 [1.1, 2.2] [5.1, 2.2] [9.1, 3.2, 1.3, 4.5]
4 NaN [2.1, 5.2] [8.1, 3.2]
要過濾的行我建議使用'DF = df.dropna(THRESH = 2)' –
超級想法,它被添加到溶液中。 – jezrael
@jezrael我試圖在另一個數據集與7個數據框,相應地改變了鍵,但得到這個錯誤。 'InvalidIndexError:Reindexing只在連接 – Shubham