我有一個wget日誌文件,並且想要解析該文件,以便我可以爲每個日誌條目提取相關信息。例如IP地址,時間戳,URL等。解析python中的wget日誌文件
下面打印一個示例日誌文件。每條條目的行數和信息細節都不相同。每條線的符號是一致的。
我能提取單個線,但我希望有一個多維數組(或類似):
import re
f = open('c:/r1/log.txt', 'r').read()
split_log = re.findall('--[0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2}.*', f)
print split_log
print len(split_log)
for element in split_log:
print(element)
####### Start log file example
2014-11-22 10:51:31 (96.9 KB/s) - `C:/r1/www.itb.ie/AboutITB/index.html' saved [13302]
--2014-11-22 10:51:31-- http://www.itb.ie/CurrentStudents/index.html
Connecting to www.itb.ie|193.1.36.24|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: ignored [text/html]
Saving to: `C:/r1/www.itb.ie/CurrentStudents/index.html'
0K .......... ....... 109K=0.2s
2014-11-22 10:51:31 (109 KB/s) - `C:/r1/www.itb.ie/CurrentStudents/index.html' saved [17429]
--2014-11-22 10:51:32-- h ttp://www.itb.ie/Vacancies/index.html
Connecting to www.itb.ie|193.1.36.24|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: ignored [text/html]
Saving to: `C:/r1/www.itb.ie/Vacancies/index.html'
0K .......... .......... .. 118K=0.2s
2014-11-22 10:51:32 (118 KB/s) - `C:/r1/www.itb.ie/Vacancies/index.html' saved [23010]
--2014-11-22 10:51:32-- h ttp://www.itb.ie/Location/howtogetthere.html
Connecting to www.itb.ie|193.1.36.24|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: ignored [text/html]
Saving to: `C:/r1/www.itb.ie/Location/howtogetthere.html'
0K .......... ....... 111K=0.2s
您的預期產出是? – 2014-11-22 11:43:03
最終我會將條目寫入數據庫。例如。 IP地址,URL,數據等等。從上面的示例中,我將因此需要諸如日期(1),url(1),http_request(1)用於第一個日誌條目,然後是日期(2),url(2),第二次http_request(2)等。 – Markus 2014-11-22 11:52:06