2015-11-11 39 views
0

我對Python很新穎(一般編程很好),並且可以真正使用你的幫助。如果不存在,則追加。如果存在,增加計數

我正在嘗試通讀防火牆日誌文件。我對其中有Deny的所有行感興趣。如果發現它應該提取源IP,目標IP,目標端口和協議。但我不想看到所有的線條,只有獨特的線條。到現在爲止還挺好。一切正常(儘管我確信它可以做得更聰明),但我也想添加一個計數器,以便我可以看到s_ip,d_ip,d_port和協議的特定組合發生了多少次,但是我不知道如何。日誌文件的

例子:

Nov 9 00:36:10 firewall %ASA-4-106023: Deny tcp src outside:1.1.1.1/43882 dst outside:2.2.2.2/23 by access-group "outside-in" [0x0, 0x0] 
Nov 9 00:36:10 firewall %ASA-4-106023: Deny tcp src outside:1.1.1.1/38780 dst outside:2.2.2.2/23 by access-group "outside-in" [0x0, 0x0] 
Nov 9 00:36:11 firewall %ASA-4-106023: Deny tcp src outside:1.1.1.1/8273 dst outside:2.2.2.2/23 by access-group "outside-in" [0x0, 0x0] 
Nov 9 00:36:12 firewall %ASA-4-106023: Deny tcp src outside:1.1.1.1/23433 dst outside:2.2.2.22/23 by access-group "outside-in" [0x0, 0x0] 
Nov 9 00:36:12 firewall %ASA-4-106023: Deny tcp src outside:1.1.1.1/25175 dst outside:2.2.2.24/23 by access-group "outside-in" [0x0, 0x0] 
Nov 9 00:36:12 firewall %ASA-4-106023: Deny tcp src outside:1.1.1.1/15855 dst outside:2.2.2.26/23 by access-group "outside-in" [0x0, 0x0] 
Nov 9 00:36:12 firewall %ASA-4-106023: Deny tcp src outside:1.1.1.1/24574 dst outside:2.2.2.27/23 by access-group "outside-in" [0x0, 0x0] 
Nov 9 00:36:12 firewall %ASA-4-106023: Deny tcp src outside:1.1.1.1/21797 dst outside:2.2.2.29/23 by access-group "outside-in" [0x0, 0x0] 
Nov 9 00:36:12 firewall %ASA-4-106023: Deny udp src outside:3.3.3.3/12112 dst outside:2.2.2.99/53031 by access-group "outside-in" [0x0, 0x0] 
Nov 9 00:36:13 firewall %ASA-4-106023: Deny icmp src outside:4.4.4.4 dst services:2.2.2.211 (type 11, code 1) by access-group "outside-in" [0x0, 0x0] 
Nov 9 00:36:17 firewall %ASA-4-106023: Deny icmp src outside:4.4.4.4 dst services:2.2.2.10 (type 3, code 3) by access-group "outside-in" [0x0, 0x0] 

我能得到以下結果

'icmp' 
'tcp', '1.1.1.1', '2.2.2.2', '23' 
'tcp', '1.1.1.1', '2.2.2.22', '23' 
'tcp', '1.1.1.1', '2.2.2.24', '23' 
'tcp', '1.1.1.1', '2.2.2.26', '23' 
'tcp', '1.1.1.1', '2.2.2.27', '23' 
'tcp', '1.1.1.1', '2.2.2.29', '23' 
'udp', '3.3.3.3', '2.2.2.99', '53031' 

我還沒有完全成功地獲得ICMP輸出(ICMP是無/端口和我正則表達式正在使用它來獲取IP地址),並且我會盡量使輸出更好一點(試着去除'和),但是我真正想要的是每行都有一個hitcount,例如第一個tcp行的計數爲3,依此類推。

import re  #for regular expressions - to match ip's 
import sys  #for parsing command line opts 

# if file is specified on command line, parse, else ask for file 
if sys.argv[1:]: 
    print "File: %s" % (sys.argv[1]) 
    logfile = sys.argv[1] 
else: 
    logfile = raw_input("Please enter a file to parse, e.g /var/log/secure: ") 

match = [] 
seen = [] 

# find all Deny lines and append them in a list 
for lines in open(logfile) : 
    extract = re.findall('Deny.*"' ,lines) 
    for i in extract : 
     match.append(i) 

# extract different keywords from Deny lines 
for lines in match : 
    prot = re.findall('Deny\s(.+?)\ssrc',lines) 
    ip_src = re.findall('src.*?:([0-9a-f].*?)/', lines) 
    ip_dst = re.findall('dst.*?:([0-9a-f].*?)/', lines) 
    #ip_sport = re.findall('src.*?[0-9a-f].*?/([0-9].*?)\s', lines)  # uncomment if you want source port also, and add ip_sport to summarized below 
    ip_dport = re.findall('dst.*?[0-9a-f].*?/([0-9].*?)\s', lines) 

    summarized = prot + ip_src + ip_dst + ip_dport 

    if summarized not in seen :    # only add unique entries 
     seen.append(summarized) 


# sort 
seen.sort() 

for lines in seen : 
    print (", ".join(repr(e) for e in lines)) 

更進一步,我是想扔它3GB的日誌文件,它現在已經運行幾個小時。任何優化代碼的好主意?

我意識到我在問很多問題,並且我非常感謝他們提供的幫助,但我的主要問題是幫助獲得指標。

+2

SO不是代碼審查/教學服務。你應該問具體的編程問題。請限制自己每個帖子詢問一個問題。 – memoselyk

+0

另一方面,[codereview.se] _is_代碼審查/教學服務。你不需要有一個特定的編程問題 - 只需要你需要建議的一些工作代碼。 –

+0

正式注意:o)。謝謝您的回答。 – joni

回答

2

Python標準庫已經有一個Counter class

你可以改變seen變量是一個Counter

from collections import Counter 

[...] 

seen = Counter() 

# extract different keywords from Deny lines 
for lines in match : 

    [...] 

    summarized = prot + ip_src + ip_dst + ip_dport 

    # NOTE: summarized must be a string or tuple. 
    seen.update([summarized]) 

在年底,seen字典將各有獨特的概括行按鍵和每行的數量將是價值。

關於優化,如果您在處理每行時遇到它,那麼在for lines in open(logfile)循環中,會更好(我認爲)。

+0

非常感謝 - 我已經實施了關於櫃檯的建議,他們的工作就像一個魅力:o) – joni

+0

如果這對你有效,請將此標記爲您接受的答案以表示您的讚賞。如果您需要更多幫助,請發佈單獨的問題。我已經將你的問題從Rev 2回滾到Rev 1。 –

0

爲避免重複輸入,您可以使用set而不是list。我會做:

seen = set() 
for lines in open(logfile) : 
    extract = re.findall('Deny.*"' ,lines) 
    for i in extract : 
     prot = re.findall('Deny\s(.+?)\ssrc',i) 
     ip_src = re.findall('src.*?:([0-9a-f].*?)/', i) 
     ip_dst = re.findall('dst.*?:([0-9a-f].*?)/', i) 
     #ip_sport = re.findall('src.*?[0-9a-f].*?/([0-9].*?)\s', i) 
     ip_dport = re.findall('dst.*?[0-9a-f].*?/([0-9].*?)\s', i) 
     seen.add((prot, ip_src, ip_dst, ip_dport)) #Add here ip_sport if you want 

這應該是更快它使用較少的循環,而另一方面set s爲無序的(這裏的是,雖然蓋了,http://code.activestate.com/recipes/576694/配方)。如果你不想構建它並且命令你應該在打印之前將它轉換爲列表

相關問題