2017-04-15 131 views
2

我有兩個dataframes df1df2串聯dataframes與大熊貓多指標數據幀

In [56]: df1.head() 
Out[56]: 
        col7    col8    col9   
        alpha0  D0 alpha0  D0 alpha0  D0 
F35_HC_531d.dat 1.103999 1.103999 1.364399 1.358938 3.171808 1.946894 
F35_HC_532d.dat 0.000000 0.000000 1.636934 1.635594 4.359431 2.362530 
F35_HC_533d.dat 0.826599 0.826599 1.463956 1.390134 3.860629 2.199387 
F35_HC_534d.dat 1.055350 1.020555 3.112200 2.498257 3.394307 2.090668 
F52_HC_472d.dat 3.808008 2.912733 3.594062 2.336720 3.027449 2.216112 

In [62]: df2.head() 
Out[62]: 
        col7   col8    col9  
       alpha1 alpha2 alpha1 alpha2 alpha1 alpha2 
filename              
F35_HC_532d.dat 1.0850 2.413 0.7914 6.072000 0.8418 5.328 
M48_HC_551d.dat 0.7029 4.713 0.7309 2.922000 0.7823 3.546 
M24_HC_458d.dat 0.7207 5.850 0.6772 5.699000 0.7135 5.620 
M48_HC_552d.dat 0.7179 4.783 0.6481 4.131999 0.7010 3.408 
M40_HC_506d.dat 0.7602 2.912 0.8420 5.690000 0.8354 1.910 

我想Concat的這兩個dataframes。請注意,外部列名稱對於兩個都是相同的,所以我只想在新的數據框中看到4個子列。我試圖使用的concat爲:

df = pd.concat([df1, df2], axis = 1, levels = 0) 

但是這將產生與名爲從col7col9兩次列的數據幀(因此數據幀有6分外列)。我怎樣才能把所有的列在第1級下相同的外部列名稱?

回答

2

您可以排序列添加sort_index

df = pd.concat([df1, df2], axis = 1, levels=0).sort_index(axis=1) 
print (df) 
        col7        col8   \ 
         D0 alpha0 alpha1 alpha2  D0 alpha0 
F35_HC_531d.dat 1.103999 1.103999  NaN NaN 1.358938 1.364399 
F35_HC_532d.dat 0.000000 0.000000 1.0850 2.413 1.635594 1.636934 
F35_HC_533d.dat 0.826599 0.826599  NaN NaN 1.390134 1.463956 
F35_HC_534d.dat 1.020555 1.055350  NaN NaN 2.498257 3.112200 
F52_HC_472d.dat 2.912733 3.808008  NaN NaN 2.336720 3.594062 
M24_HC_458d.dat  NaN  NaN 0.7207 5.850  NaN  NaN 
M40_HC_506d.dat  NaN  NaN 0.7602 2.912  NaN  NaN 
M48_HC_551d.dat  NaN  NaN 0.7029 4.713  NaN  NaN 
M48_HC_552d.dat  NaN  NaN 0.7179 4.783  NaN  NaN 

             col9       
       alpha1 alpha2  D0 alpha0 alpha1 alpha2 
F35_HC_531d.dat  NaN  NaN 1.946894 3.171808  NaN NaN 
F35_HC_532d.dat 0.7914 6.072000 2.362530 4.359431 0.8418 5.328 
F35_HC_533d.dat  NaN  NaN 2.199387 3.860629  NaN NaN 
F35_HC_534d.dat  NaN  NaN 2.090668 3.394307  NaN NaN 
F52_HC_472d.dat  NaN  NaN 2.216112 3.027449  NaN NaN 
M24_HC_458d.dat 0.6772 5.699000  NaN  NaN 0.7135 5.620 
M40_HC_506d.dat 0.8420 5.690000  NaN  NaN 0.8354 1.910 
M48_HC_551d.dat 0.7309 2.922000  NaN  NaN 0.7823 3.546 
M48_HC_552d.dat 0.6481 4.131999  NaN  NaN 0.7010 3.408 
+0

太好了!你能不能解釋一下這是如何工作的? – Peaceful

2

可以使用join與參數how='outer'

df1.join(df2, how='outer').sort_index(1) 

enter image description here

+0

不錯!我不明白的是爲什麼'sort_index'擺脫了重複的列名稱。任何意見? – Peaceful

+0

@和平他們仍然在那裏。當你在索引的早期級別中有連續的值時,你是大熊貓選擇合併表列以達到美學目的 – piRSquared

+0

但是爲什麼'sort_index'會這樣做呢?或者,這甚至是一般的真實?例如,如果有一個函數「merge_repeated_columns」,這是可以理解的。我錯過了明顯的東西嗎? – Peaceful