2
下面是大得多/複雜dataframes的蠅頭/玩具版本我的工作:在默認/填寫值外連接
>>> A
key u v w x
0 a 0.757954 0.258917 0.404934 0.303313
1 b 0.583382 0.504687 NaN 0.618369
2 c NaN 0.982785 0.902166 NaN
3 d 0.898838 0.472143 NaN 0.610887
4 e 0.966606 0.865310 NaN 0.548699
5 f NaN 0.398824 0.668153 NaN
>>> B
key y z
0 a 0.867603 NaN
1 b NaN 0.191067
2 c 0.238616 0.803179
3 p 0.080446 NaN
4 q 0.932834 NaN
5 r 0.706561 0.814467
(FWIW,在文章的最後,我提供的代碼產生這些dataframes。)
我想產生一個外部的key
列加入這些dataframes的,以這樣的方式由外部引起的新位置加入獲取缺省值0.0。督察,期望的結果是這樣的
key u v w x y z
0 a 0.757954 0.258917 0.404934 0.303313 0.867603 NaN
1 b 0.583382 0.504687 NaN 0.618369 NaN 0.191067
2 c NaN 0.982785 0.902166 NaN 0.238616 0.803179
3 d 0.898838 0.472143 NaN 0.610887 0.000000 0.000000
4 e 0.966606 0.86531 NaN 0.548699 0.000000 0.000000
5 f NaN 0.398824 0.668153 NaN 0.000000 0.000000
6 p 0.000000 0.000000 0.000000 0.000000 0.080446 NaN
7 q 0.000000 0.000000 0.000000 0.000000 0.932834 NaN
8 r 0.000000 0.000000 0.000000 0.000000 0.706561 0.814467
(請注意,這需要的輸出包含了一些NaN的,即那些已經存在於A
或B
。)
的merge
方法讓我有一部分路,但是填充的默認值是NaN的,不爲0.0的:
>>> C = pandas.DataFrame.merge(A, B, how='outer', on='key')
>>> C
key u v w x y z
0 a 0.757954 0.258917 0.404934 0.303313 0.867603 NaN
1 b 0.583382 0.504687 NaN 0.618369 NaN 0.191067
2 c NaN 0.982785 0.902166 NaN 0.238616 0.803179
3 d 0.898838 0.472143 NaN 0.610887 NaN NaN
4 e 0.966606 0.865310 NaN 0.548699 NaN NaN
5 f NaN 0.398824 0.668153 NaN NaN NaN
6 p NaN NaN NaN NaN 0.080446 NaN
7 q NaN NaN NaN NaN 0.932834 NaN
8 r NaN NaN NaN NaN 0.706561 0.814467
的fillna
方法不能產生所希望的輸出,因爲它改變了一些位置,應該是來氟米特t不變:
>>> C.fillna(0.0)
key u v w x y z
0 a 0.757954 0.258917 0.404934 0.303313 0.867603 0.000000
1 b 0.583382 0.504687 0.000000 0.618369 0.000000 0.191067
2 c 0.000000 0.982785 0.902166 0.000000 0.238616 0.803179
3 d 0.898838 0.472143 0.000000 0.610887 0.000000 0.000000
4 e 0.966606 0.865310 0.000000 0.548699 0.000000 0.000000
5 f 0.000000 0.398824 0.668153 0.000000 0.000000 0.000000
6 p 0.000000 0.000000 0.000000 0.000000 0.080446 0.000000
7 q 0.000000 0.000000 0.000000 0.000000 0.932834 0.000000
8 r 0.000000 0.000000 0.000000 0.000000 0.706561 0.814467
如何有效地獲得所需的輸出? (性能事項在這裏,因爲我打算在比這裏示出的那些大得多dataframes執行此操作。)
FWIW,下面是產生例如dataframes A
和B
的代碼。
from pandas import DataFrame
from collections import OrderedDict
from random import random, seed
def make_dataframe(rows, colnames):
return DataFrame(OrderedDict([(n, [row[i] for row in rows])
for i, n in enumerate(colnames)]))
maybe_nan = lambda: float('nan') if random() < 0.4 else random()
seed(0)
A = make_dataframe([['a', maybe_nan(), maybe_nan(), maybe_nan(), maybe_nan()],
['b', maybe_nan(), maybe_nan(), maybe_nan(), maybe_nan()],
['c', maybe_nan(), maybe_nan(), maybe_nan(), maybe_nan()],
['d', maybe_nan(), maybe_nan(), maybe_nan(), maybe_nan()],
['e', maybe_nan(), maybe_nan(), maybe_nan(), maybe_nan()],
['f', maybe_nan(), maybe_nan(), maybe_nan(), maybe_nan()]],
('key', 'u', 'v', 'w', 'x'))
B = make_dataframe([['a', maybe_nan(), maybe_nan()],
['b', maybe_nan(), maybe_nan()],
['c', maybe_nan(), maybe_nan()],
['p', maybe_nan(), maybe_nan()],
['q', maybe_nan(), maybe_nan()],
['r', maybe_nan(), maybe_nan()]],
('key', 'y', 'z'))
用於多鍵外的情況下聯接,見here。
問題:如何將一個推廣這一解決方案的情況下合併在多列上,例如'合併(...,on =('key1','key2',...),...)'? – kjo
我不知道,在調用'merge()'後你不知道連接列是什麼。 – HYRY
如果您使用私人操作,您可以獲取以下信息:'mo = pd.tools.merge._MergeOperation(A,B,how =「outer」); print(mo.left_on)' – HYRY