所以我現在已經使用Willie約8個月了,它記錄了它運行的IRC頻道上發生的所有事情的raw.log。現在,問題是它記錄了很多不必要的,很好,膨脹。如何修剪巨大的文本文件?
下面是一個例子:
<<1419986827.01 :[email protected] NICK Snoo62763
>>1419986827.04 PRIVMSG Snoo62763 :TypeError: not all arguments converted during string formatting (file "C:\Python27\willie\willie\coretasks.py", line 254, in track_nicks)
<<1419986827.12 :[email protected] PRIVMSG Snoo62763 :TypeError: not all arguments converted during string formatting (file "C:\Python27\willie\willie\coretasks.py", line 254, in track_nicks)
<<1419986827.22 :[email protected] NOTICE Snoo62763 :Welcome to Snoonet, Snoo62763! Here on Snoonet, we provide services to enable the registration of nicknames and channels! For details, type /msg NickServ help and /msg ChanServ help.
<<1419986832.84 :[email protected]/venn177 PRIVMSG #RLB :uh, well, this seems to work
<<1419986832.84 :[email protected]/venn177 PRIVMSG #RLB :in any case, let's try this
>>1419986852.92 QUIT :KeyboardInterrupt
>>1419986861.61 CAP LS
>>1419986861.61 NICK BotSelig
>>1419986861.62 USER willie +iw BotSelig :Willie Embosbot, http://willie.dftba.net
<<1419986861.67 :veronica.snoonet.org NOTICE Auth :*** Looking up your hostname...
所以出了這一切的,我想保持唯一的事情是之後發生的文本「#RLB:」。我想保留每行'文本'仍然是自己的,但修剪掉所有不必要的絨毛。那麼我怎麼能讀一些文本文件的每行內容,檢查它是否有「#RLB:」,然後只保存發生在此之後的所有內容?
最終目標是讓數據庫產生馬爾可夫鏈,這顯然不適用於那裏所有的膨脹。 (其實我不知道這是否有助於知道)
我想說明它是我想採取什麼樣的既定那裏,只是修剪下來到另外一個辦法:
uh, well, this seems to work
in any case, let's try this