2016-09-28 57 views
2

下面是大得多/複雜dataframes的蠅頭/玩具版本我的工作:在默認/填寫值外連接

>>> A 
    key   u   v   w   x 
0 a 0.757954 0.258917 0.404934 0.303313 
1 b 0.583382 0.504687  NaN 0.618369 
2 c  NaN 0.982785 0.902166  NaN 
3 d 0.898838 0.472143  NaN 0.610887 
4 e 0.966606 0.865310  NaN 0.548699 
5 f  NaN 0.398824 0.668153  NaN 

>>> B 
    key   y   z 
0 a 0.867603  NaN 
1 b  NaN 0.191067 
2 c 0.238616 0.803179 
3 p 0.080446  NaN 
4 q 0.932834  NaN 
5 r 0.706561 0.814467 

(FWIW,在文章的最後,我提供的代碼產生這些dataframes。)

我想產生一個外部的key列加入這些dataframes的,以這樣的方式由外部引起的新位置加入獲取缺省值0.0。督察,期望的結果是這樣的

key   u   v   w   x   y   z 
0 a 0.757954 0.258917 0.404934 0.303313 0.867603  NaN 
1 b 0.583382 0.504687  NaN 0.618369  NaN 0.191067 
2 c  NaN 0.982785 0.902166  NaN 0.238616 0.803179 
3 d 0.898838 0.472143  NaN 0.610887 0.000000 0.000000 
4 e 0.966606 0.86531  NaN 0.548699 0.000000 0.000000 
5 f  NaN 0.398824 0.668153  NaN 0.000000 0.000000 
6 p 0.000000 0.000000 0.000000 0.000000 0.080446  NaN 
7 q 0.000000 0.000000 0.000000 0.000000 0.932834  NaN 
8 r 0.000000 0.000000 0.000000 0.000000 0.706561 0.814467 

(請注意,這需要的輸出包含了一些NaN的,即那些已經存在於AB。)

merge方法讓我有一部分路,但是填充的默認值是NaN的,不爲0.0的:

>>> C = pandas.DataFrame.merge(A, B, how='outer', on='key') 
>>> C 
    key   u   v   w   x   y   z 
0 a 0.757954 0.258917 0.404934 0.303313 0.867603  NaN 
1 b 0.583382 0.504687  NaN 0.618369  NaN 0.191067 
2 c  NaN 0.982785 0.902166  NaN 0.238616 0.803179 
3 d 0.898838 0.472143  NaN 0.610887  NaN  NaN 
4 e 0.966606 0.865310  NaN 0.548699  NaN  NaN 
5 f  NaN 0.398824 0.668153  NaN  NaN  NaN 
6 p  NaN  NaN  NaN  NaN 0.080446  NaN 
7 q  NaN  NaN  NaN  NaN 0.932834  NaN 
8 r  NaN  NaN  NaN  NaN 0.706561 0.814467 

fillna方法不能產生所希望的輸出,因爲它改變了一些位置,應該是來氟米特t不變:

>>> C.fillna(0.0) 
    key   u   v   w   x   y   z 
0 a 0.757954 0.258917 0.404934 0.303313 0.867603 0.000000 
1 b 0.583382 0.504687 0.000000 0.618369 0.000000 0.191067 
2 c 0.000000 0.982785 0.902166 0.000000 0.238616 0.803179 
3 d 0.898838 0.472143 0.000000 0.610887 0.000000 0.000000 
4 e 0.966606 0.865310 0.000000 0.548699 0.000000 0.000000 
5 f 0.000000 0.398824 0.668153 0.000000 0.000000 0.000000 
6 p 0.000000 0.000000 0.000000 0.000000 0.080446 0.000000 
7 q 0.000000 0.000000 0.000000 0.000000 0.932834 0.000000 
8 r 0.000000 0.000000 0.000000 0.000000 0.706561 0.814467 

如何有效地獲得所需的輸出? (性能事項在這裏,因爲我打算在比這裏示出的那些大得多dataframes執行此操作。)


FWIW,下面是產生例如dataframes AB的代碼。

from pandas import DataFrame 
from collections import OrderedDict 
from random import random, seed 

def make_dataframe(rows, colnames): 
    return DataFrame(OrderedDict([(n, [row[i] for row in rows]) 
           for i, n in enumerate(colnames)])) 

maybe_nan = lambda: float('nan') if random() < 0.4 else random() 

seed(0) 

A = make_dataframe([['a', maybe_nan(), maybe_nan(), maybe_nan(), maybe_nan()], 
        ['b', maybe_nan(), maybe_nan(), maybe_nan(), maybe_nan()], 
        ['c', maybe_nan(), maybe_nan(), maybe_nan(), maybe_nan()], 
        ['d', maybe_nan(), maybe_nan(), maybe_nan(), maybe_nan()], 
        ['e', maybe_nan(), maybe_nan(), maybe_nan(), maybe_nan()], 
        ['f', maybe_nan(), maybe_nan(), maybe_nan(), maybe_nan()]], 
        ('key', 'u', 'v', 'w', 'x')) 

B = make_dataframe([['a', maybe_nan(), maybe_nan()], 
        ['b', maybe_nan(), maybe_nan()], 
        ['c', maybe_nan(), maybe_nan()], 
        ['p', maybe_nan(), maybe_nan()], 
        ['q', maybe_nan(), maybe_nan()], 
        ['r', maybe_nan(), maybe_nan()]], 
        ('key', 'y', 'z')) 

用於多鍵外的情況下聯接,見here

回答

1

您可以merge後補零:

res = pd.merge(A, B, how="outer") 
res.loc[~res.key.isin(A.key), A.columns] = 0 

編輯

跳過key柱:

res.loc[~res.key.isin(A.key), A.columns.drop("key")] = 0 
+0

問題:如何將一個推廣這一解決方案的情況下合併在多列上,例如'合併(...,on =('key1','key2',...),...)'? – kjo

+0

我不知道,在調用'merge()'後你不知道連接列是什麼。 – HYRY

+0

如果您使用私人操作,您可以獲取以下信息:'mo = pd.tools.merge._MergeOperation(A,B,how =「outer」); print(mo.left_on)' – HYRY