2010-10-10 149 views
4

最近有要求我將「tcpdump -i eth0 -neXXs0」的文本輸出轉換爲pcap文件。於是我編寫了一個python腳本,將信息轉換爲text2pcap可理解的中間格式。由於這是我的第一個python程序,顯然還有改進的空間。我希望知識淵博的人們能夠清除任何缺陷和/或加強它。python:將tcpdump轉換爲text2pcap可讀格式

輸入

tcpdump的輸出是按以下格式:

20:11:32.001190 00:16:76:1408米:2B:B1> 00:11:5C:78:CA :C0,以太網類型的IPv4(0x0800的),長度72:123.236.188.140.41756> 94.59.34.210.45931:UDP,長度30

0x0000: 0011 5c78 cac0 0016 767f 2bb1 0800 4500 ..\x....v.+...E. 
0x0010: 003a 0000 4000 4011 812d 7bec bc8c 5e3b .:[email protected]@..-{...^; 
0x0020: 22d2 a31c b36b 0026 b9bd 2033 6890 ad33 "....k.&...3h..3 
0x0030: e845 4b8d 2ba1 0685 0cb3 70dd 9b98 76d8 .EK.+.....p...v. 
0x0040: 8fc6 8293 bf33 325a      .....32Z 

輸出

輸入代碼在這裏

格式理解由text2pcap:

20:11:32.001190

0000: 00 11 5c 78 ca c0 00 16 76 7f 2b b1 08 00 45 00 ..\x....v.+...E. 
0010: 00 3a 00 00 40 00 40 11 81 2d 7b ec bc 8c 5e 3b .:[email protected]@..-{...^; 
0020: 22 d2 a3 1c b3 6b 00 26 b9 bd 20 33 68 90 ad 33 "....k.&...3h..3 
0030: e8 45 4b 8d 2b a1 06 85 0c b3 70 dd 9b 98 76 d8 .EK.+.....p...v. 
0040: 8f c6 82 93 bf 33 32 5a .....32Z 

以下是我的代碼。

import re 
# Identify time of the current packet. 
time = re.compile ('(..:..:..\.[\w]*) ') 
# Get individual elements from the packet. ie. offset, hexdump, chars 
all = re.compile('[ |\t]+0x([\w]+:) +(.+) +(.*)') 
# Regex for two spaces 
twoSpaces = re.compile(' +') 
# Regex for single space 
singleSpace = re.compile(' ') 
# Single byte pattern. 
singleBytePattern = re.compile(r'([\w][\w])') 

# Open files. 
f = open ('pcap.txt', 'r') 
outfile = open ('ashu.txt', 'w') 

for line in f: 
    result = time.match (line) 
    if result: 
    # If current line contains time format dump only time 
    print result.group() 
    outfile.write (result.group() + '\n') 
    else: 
    print line, 
    # Split line containing hex dump and tokenize into list elements. 
    result = all.split (line) 
    if result: 
     i = 0 
     for values in result: 
     if (i == 2): 
      # Strip off additional spaces in hex dump 
      # Useful when hex dump does not end in 16 bytes boundary. 
      val = twoSpaces.sub ('', values) 

      # Tokenize individual elements seperated by single space. 
      byteResult = singleSpace.split (val) 
      for twoByte in byteResult: 
      # Identify individual byte 
      singleByte = singleBytePattern.split(twoByte) 
      byteOffset = 0 
      for oneByte in singleByte: 
       if ((byteOffset == 1) or (byteOffset == 3)): 
       # Write out individual byte with a space char appended 
       print oneByte, 
       outfile.write (oneByte+ ' ') 
       byteOffset = byteOffset + 1 
     elif (i == 3): 
      # Write of char format of hex dump 
      print " "+values, 
      outfile.write (' ' + values+ ' ') 
     elif (i == 4): 
      outfile.write (values) 
     else: 
      print values, 
      outfile.write (values + ' ') 
     i=i+1 
    else: 
     print "could not split" 
f.close() 
outfile.close() 
+1

對於初學者,您應該將您的評論縮進與其引用的代碼相同的級別。看到縮進按現在的方式分解是非常令人分心的。 – 2010-10-10 13:15:28

+0

謝謝。我已糾正它。 – Taroko 2010-10-11 03:57:57

回答

3

使用的tcpdump-w選項寫入PCAP格式文件

tcpdump -w filename.pcap 

的Wireshark應該能夠閱讀它。

+0

要求是我只有* .txt文件格式的hexdump。這個* .txt文件需要轉換成* .pcap格式,以便我可以用'tcpreplay'命令重播它。 – Taroko 2010-10-11 04:01:45

+0

既然沒有人迴應,我應該假設我的代碼是完美的。 – Taroko 2010-10-15 07:55:14

+0

沒有'完美的代碼':-)只有個別選擇如何編碼的目的。對於Python,我確信會有十幾種可能的變化 - 有些變化也很短。 – nik 2010-10-15 08:14:12