這是我前段時間根據日誌格式量身定製的日誌解析器的一種變體。 (一般方法跟Jim Dennis的描述非常接近,儘管我使用了列表的defaultdict來累積任何給定會話的所有條目。)
from pyparsing import Suppress,Word,nums,restOfLine
from datetime import datetime
from collections import defaultdict
def convertToDateTime(tokens):
month,day,year,hh,mm,ss = tokens
return datetime(year+2000, month, day, hh,mm,ss)
# define building blocks for parsing and processing log file entries
SLASH,COLON = map(Suppress,"/:")
integer = Word(nums).setParseAction(lambda t:int(t[0]))
date = integer + (SLASH + integer)*2
time = integer + (COLON + integer)*2
timestamp = date + time
timestamp.setParseAction(convertToDateTime)
# define format of a single line in the log file
logEntry = timestamp("timestamp") + integer("sessionid") + restOfLine("descr")
# summarize calls into single data structure
calls = defaultdict(list)
for logline in log:
entry = logEntry.parseString(logline)
calls[entry.sessionid].append(entry)
# first pass to find start/end time for each call
for sessionid in sorted(calls):
calldata = calls[sessionid]
print sessionid, calldata[-1].timestamp - calldata[0].timestamp
爲您的數據,這樣打印出:
1234 0:00:10
可以處理條目的每個會話的列表,並附有類似的方法來梳理出的子事務。