將條件行數據合併到新的數據幀

我有一個從csv讀取的數據幀。將條件行數據合併到新的數據幀

  time node txrx src dest txid hops 
0  34355146  2 TX 2  1  1 NaN 
1  34373907  1 RX 2  1  1 1.0 
2  44284813  2 TX 2  1  2 NaN 
3  44302557  1 RX 2  1  2 1.0 
4  44596500  3 TX 3  1  2 NaN 
5  44630682  1 RX 3  1  2 2.0 
6  50058251  2 TX 2  1  3 NaN 
7  50075994  1 RX 2  1  3 1.0 
8  51338658  3 TX 3  1  3 NaN 
9  51382629  1 RX 3  1  3 2.0

我需要能夠創建一個新的數據幀這需要在TX/RX行中的值，創建一個單獨的行，每對：

花點時間從「時間'欄。如果'txrx'中的值是'TX'，則將其放入'tx_time'列中，如果該值爲「RX」，則將該值放入'rx_time'列（在新數據幀的行內）。
「啤酒花」的值取自RX行。
這是爲每個['src'，'dest'，'txid']組完成的。
「節點」列被忽略。然後

東風應該是這樣的：

 tx_time rx_time src dest txid hops 
0 34355146 34373907 2  1  1  1 
1 44284813 44302557 2  1  2  1 
2 44596500 44630682 3  1  2  2 
3 50058251 50075994 2  1  3  1 
4 51338658 51382629 3  1  3  2

我明白怎麼做步驟（3），但是我被困在如何嘗試了一下（1）和（2）。建議嗎？

來源

2017-10-10 mbadd

我已經從@Wen的_pivot_table_的解決方案，但_defaultdict_和piRSquared和費沙_concat_方法也都做的工作。我敢肯定，有一個關於哪個更有效的討論:) – mbadd

通過使用pivot_table

df.bfill().pivot_table(index=['src','dest','txid','hops'],columns=['txrx'],values='time').reset_index() 
Out[766]: 
txrx src dest txid hops  RX  TX 
0  2  1  1 1.0 34373907 34355146 
1  2  1  2 1.0 44302557 44284813 
2  2  1  3 1.0 50075994 50058251 
3  3  1  2 2.0 44630682 44596500 
4  3  1  3 2.0 51382629 51338658

或者使用unstack

df.bfill().set_index(['src','dest','txid','hops','txrx']).time.unstack(-1).reset_index() 
Out[768]: 
txrx src dest txid hops  RX  TX 
0  2  1  1 1.0 34373907 34355146 
1  2  1  2 1.0 44302557 44284813 
2  2  1  3 1.0 50075994 50058251 
3  3  1  2 2.0 44630682 44596500 
4  3  1  3 2.0 51382629 51338658

PS：使用.rename(columns={})我沒加這裏，因爲會使得代碼過長重命名......

來源

2017-10-10 15:16:28 Wen

unpack的默認級別是-1。沒有必要通過它。 – piRSquared

@piRSquared明白了！ :-) – Wen

pivot_table非常好，謝謝.rename（）提示！ – mbadd

儘管使用concat，但我認爲@Wen使用數據透視的解決方案會更有效率

df_tx = df[::2].reset_index().drop(['index', 'txrx', 'node'], axis = 1).rename(columns = {'time': 'tx_time'}) 
df_rx = df[1::2].reset_index().drop(['index', 'txrx', 'node'], axis = 1).rename(columns = {'time': 'rx_time'}) 

pd.concat([df_tx, df_rx ], axis = 1).T.drop_duplicates().T.dropna(1)

你得到

tx_time  src dest txid rx_time  hops 
0 34355146.0 2.0 1.0  1.0  34373907.0 1.0 
1 44284813.0 2.0 1.0  2.0  44302557.0 1.0 
2 44596500.0 3.0 1.0  2.0  44630682.0 2.0 
3 50058251.0 2.0 1.0  3.0  50075994.0 1.0 
4 51338658.0 3.0 1.0  3.0  51382629.0 2.0

來源

2017-10-10 15:32:42 Vaishali

一個defaultdict方法
這實際上可能會更快的OP的目的。
如果速度很重要，請檢查。因人而異。

from collections import defaultdict 

d = defaultdict(lambda: defaultdict(dict)) 
cols = 'tx_time rx_time src dest txid hops'.split() 

for t in df.itertuples(): 
    i = (t.src, t.dest, t.txid) 
    d[t.txrx.lower() + '_time'][i] = t.time 
    if pd.notnull(t.hops): 
     d['hops'][i] = int(t.hops) 

pd.DataFrame(d).rename_axis(['src', 'dest', 'txid']) \ 
    .reset_index().reindex_axis(cols, 1) 

    tx_time rx_time src dest txid hops 
0 34355146 34373907 2  1  1  1 
1 44284813 44302557 2  1  2  1 
2 50058251 50075994 2  1  3  1 
3 44596500 44630682 3  1  2  2 
4 51338658 51382629 3  1  3  2

來源

2017-10-10 16:33:44 piRSquared

感謝您的解決方案。在這種情況下，速度並不重要（它只是重新排列表格，所以我可以繪製它），所以pivot_table更容易一些。但是，如果我做任何實時處理，我都會記住這一點。 – mbadd

將條件行數據合併到新的數據幀

回答

相關問題