以下是我的數據框的簡化塊。我想處理在熊貓和數據處理中匹配元組
,No.,Time,Source,Destination,Protocol,Length,Info,src_dst_pair
325778,112.305107,02:e0,Broadcast,ARP,64,Who has 253.244.230.77? Tell 253.244.230.67,"('02:e0', 'Broadcast')"
801130,261.868118,02:e0,Broadcast,ARP,64,Who has 253.244.230.156? Tell 253.244.230.67,"('02:e0', 'Broadcast')"
700094,222.055094,02:e0,Broadcast,ARP,60,Who has 253.244.230.77? Tell 253.244.230.156,"('02:e0', 'Broadcast')"
766542,766543,247.796156,100.118.138.150,41.177.26.176,TCP,66,32222 > http [SYN] Seq=0,"('100.118.138.150', '41.177.26.176')"
767405,248.073313,100.118.138.150,41.177.26.176,TCP,64,32222 > http [ACK] Seq=1,"('100.118.138.150', '41.177.26.176')"
767466,248.083268,100.118.138.150,41.177.26.176,HTTP,380,Continuation [Packet capture],"('100.118.138.150', '41.177.26.176')"
我有(最後一個元素)的src_dst_pair所有獨特元素
uniq_src_dst_pair = numpy.unique(data.src_dst_pair.ravel())
[('02:e0', 'Broadcast') ('100.118.138.150', '41.177.26.176')]
我怎樣才能做到在大熊貓以下
每個元素位於uniq_src_dst_pair中,請針對df.src_dst_pair進行檢查。如果匹配,增加df.Length並將其存儲在單獨的列
我的預期結果爲:
('02:e0', 'Broadcast') : 188
('100.118.138.150', '41.177.26.176') : 510
我怎樣才能做到這一點?
下面是我嘗試
import pandas
import numpy
data = pandas.read_csv('first.csv')
print data
uniq_src_dst_pair = numpy.unique(data.src_dst_pair.ravel())
print uniq_src_dst_pair
print len(uniq_src_dst_pair)
# following is hardcoded, but need to be more general for the above list
match1 = data[data.src_dst_pair == "('02:e0:ed:0a:fb:5f', 'Broadcast')"] # doesn't work
剛要清楚,你試圖獲得通過每個連接傳輸的總字節數(連接由源和目的地標識),對吧? –
你是對的。 – user2532296