將數據幀與不可列的列合併

我想合併兩個Pandas DataFrame。如果item代碼（例如A，B，C，D）相同，則它們的屬性a,b必須相同，但是b是不可更改的numpy數組或的列表。將數據幀與不可列的列合併

富：

item a  b    
A  1  [2,0] 
B  1  [3,0]   
C  0  [4,0]

酒吧：

item a  b 
A  1  [2,0] 
D  0  [6,1]

這是我想

code a  b  Foo Bar 
A  1  [2,0] 1  1 
B  1  [3,0] 1  0 
C  0  [4,0] 1  0 
D  0  [6,1] 0  1

來源

2017-08-31 niukasu

你可以使用df.merge和df.fillna什麼：

out = foo.assign(Foo=1).merge(bar.assign(Bar=1), 'outer').fillna(0) 
print(out) 

    item a  b Foo Bar 
0 A 1 (2, 0) 1.0 1.0 
1 B 1 (3, 0) 1.0 0.0 
2 C 0 (4, 0) 1.0 0.0 
3 D 0 (6, 1) 0.0 1.0

如果b是一個列表類型，可以先將它轉換爲一個元組，然後合併。

foo.b = foo.b.apply(tuple) 
bar.b = bar.b.apply(tuple) 
out = foo.assign(Foo=1).merge(bar.assign(Bar=1), 'outer').fillna(0) 
out.b = out.b.apply(list) 

print(out) 

    item a  b Foo Bar 
0 A 1 [2, 0] 1.0 1.0 
1 B 1 [3, 0] 1.0 0.0 
2 C 0 [4, 0] 1.0 0.0 
3 D 0 [6, 1] 0.0 1.0

來源

2017-08-31 23:45:02

當我合併，它給了我一個錯誤說，b是unhashable型numpy.ndarray – niukasu

@niukasu好了，你的問題是關於元組。但你正在努力陣列:) –

@niukasu編輯，第二個解決方案應該有所幫助。 –

這是一種合併的方式，而不會將unhashables轉換爲元組。

由於item碼具有1對1的對應關係與在a和 b列中的值，它足以對單獨item合併。由於在item列中的值是哈希的，是沒有問題的合併：

import pandas as pd 

foo = pd.DataFrame({'item': list('ABC'), 'a':[1,1,0], 'b':[[2,0], [3,0], [4,0]]}) 
bar = pd.DataFrame({'item': list('AD'), 'a':[1,0], 'b':[[2,0], [6,1]]}) 

result = pd.merge(foo.assign(Foo=1), bar.assign(Bar=1), on='item', how='outer', 
        suffixes=['', '_y']) 
for col in ['a','b']: 
    result[col].update(result[col+'_y']) 

for col in ['Foo', 'Bar']: 
    result[col] = result[col].fillna(0) 
result = result.drop(['a_y', 'b_y'], axis=1) 
print(result)

產生

 a  b item Foo Bar 
0 1.0 [2, 0] A 1.0 1.0 
1 1.0 [3, 0] B 1.0 0.0 
2 0.0 [4, 0] C 1.0 0.0 
3 0.0 [6, 1] D 0.0 1.0

有一點在合併後需要補妝的工作，但是。由於我們只上item合併，result得到的a和b兩列 - 從bar 的那些被稱爲a_y和b_y。使用update方法從a填充來自a的NaN值，其中a_y的相應值，然後b也完成相同的。

聰明的主意，用foo.assign(Foo=1), bar.assign(Bar=1)獲得Foo和Bar列從cᴏʟᴅsᴘᴇᴇᴅ's solution拍攝。

來源

2017-09-01 00:25:02 unutbu

我知道有一種方法，我不知道它就像'item'上的合併一樣簡單。 –

或者你可以試試這個

foo.b = foo.b.apply(tuple) 
bar.b = bar.b.apply(tuple) 
df=pd.concat([foo,bar],axis=0).drop_duplicates() 
df['foo']=df.isin(foo).a.astype(int) 
df['bar']=df.isin(bar).a.astype(int) 
df.b=df.b.apply(list) 
df 
Out[60]: 
    a  b item foo bar 
0 1 [2, 0] A 1 1 
1 1 [3, 0] B 1 0 
2 0 [4, 0] C 1 0 
1 0 [6, 1] D 0 1

來源

2017-09-01 00:57:44 Wen

cols = ['a', 'b', 'item'] 
pd.concat([Foo, Bar], keys=['Foo', 'Bar']) \ 
    .assign(c=1).pipe(lambda d: d.assign(b=d.b.apply(tuple))) \ 
    .set_index(cols, append=True) \ 
    .c.unstack(0, fill_value=0).reset_index(cols) \ 
    .pipe(lambda d: d.assign(b=d.b.apply(list))) 

    a  b item Bar Foo 
0 1 [2, 0] A 1 1 
1 0 [6, 1] D 1 0 
1 1 [3, 0] B 0 1 
2 0 [4, 0] C 0 1

來源

2017-09-01 02:39:34 piRSquared

將數據幀與不可列的列合併

回答

相關問題