2016-03-04 89 views
0

我有兩個日誌文件,我想要合併不同的日期/時間格式。合併不同格式的日誌文件

第一個文件是一個標準的Apache的access_log文件像這樣:

127.0.0.1 - - [29 /二月/ 2016:16:57:52 -0600]「GET /應用/ WCS/API ?/版本nodeRef =工作區:// SpacesStore/ecd62cfa-fd19-4d6b-b45d-14f0e5b92cf0 HTTP/1.1" 200 567
127.0.0.1 - - [29 /二月/ 2016:16:57:52 -0600]「GET /應用/ WCS/API /節點/工作區/ SpacesStore/ecd62cfa-fd19-4d6b-b45d-14f0e5b92cf0 /工作流實例HTTP/1.1" 200 40
127.0.0.1 - - [29 /二月/ 2016:16:57: 52 -0600]「GET/application/wcs/cisco/appId?userId = abcdefg & requestType = get HTTP/1.1」200 45
173.37.239.93 - abcdefg [29/Feb/2016:16:57:52 -0600]「GET/share/page/site/nextgen-edcs/document-details?nodeRef = workspace:// SpacesStore/ecd62cfa-fd19 -4d6b-b45d-14f0e5b92cf0 HTTP/1.1「200 124492
173.37.239.93 -abcdefg [29/Feb/2016:16:57:53 -0600]」GET /share/service/messages_69bcdfdb058bb873ff49cc2a10c958b7.js?locale=en_US HTTP/1.1" 200 81698
173.37.239.93 - ABCDEFG [29 /二月/ 2016:16:57:53 -0600] 「GET /share/res/yui/history/history_543b42a00a378f4d4b6e70c81d915b0a.js HTTP/1.1」 200 5781

。 。 。 'abcdedfg'= userid。

第二日誌文件的格式如下所示:

2016年2月12日08:16:03630 WARN [cluster.cache.HazelcastSimpleCache] [HTTP的生物8443-EXEC-212]羣集是活性,但把(K,v)被稱爲高速緩存HazelcastSimpleCache [cacheName = cache.readersSharedCache]
2016年2月12日08:16:03630 WARN [cluster.cache.HazelcastSimpleCache] [HTTP的生物8443-EXEC-212 ]集羣處於不活動狀態,但調用get(key)以獲取緩存HazelcastSimpleCache [cacheName = cache.readersSharedCache],key = AclEntity [ID = 1893033,version = 55,aclId = 16cf5bc3-27d0-4d50-a93d-3bee1ddd​​112e,isLatest = true, aclVersion = 1,inherits = true,inheritsFrom = 1889292,type = 1,inheritedAcl = 1893034,isVersioned =假,requiresVersion =假,aclChangeSet = 1451473]
2016年2月12日08:16:03630 WARN [cluster.cache.HazelcastSimpleCache] [HTTP的生物8443-EXEC-212]羣集是無活性的,但把(K,V )被稱爲高速緩存HazelcastSimpleCache [cacheName = cache.readersSharedCache]

我的目標是:

  1. 轉換的日期/時間格式中的第一個日誌文件,第二日誌的日期/時間格式文件
  2. 從第一個日誌文件中取消IP地址,但保留用戶標識。
  3. 兩個日誌文件合併在一起的日期/時間
  4. 排序。

這裏是我迄今爲止 -

$LOGFILE1 = "catalina.out"; 
$LOGFILE2 = "access_log"; 

open(LOGFILE1) or die("Could not open log file."); 
foreach $line (<LOGFILE1>) { 
    chomp($line); 
    if ($line =~ /^2016.+$/) { 
     print $line . "\n"; 
    } 
} 

open(LOGFILE2) or die("Could not open log file."); 
foreach $line (<LOGFILE2>) { 
chomp($line); 
if ($line =~ /\d{2}\/\S{3}\/\d{4}:\d{2}:\d{2}:\d{2} -\d{3}/) { 
print $line . "\n"; 
} 

    # format of file 1 
    # DD/MMM/YYYY:HH:MM:SS -NNNN 
    # 29/Feb/2016:20:03:07 -600 
    # format of file 2 
    # YYYY-MM-DD HH:MM:SS,NNN 
    # 2016-02-12 08:16:03,631 
} 

所以我基本上只與日期/時間信息線感興趣,所以上面的代碼被刪除了其它線路。

我被卡住的地方是:
1)如何將文件1中的日期/時間格式轉換爲文件2的數據/時間格式?
2)我不感興趣的IP地址,但我也想保持用戶ID。由於文件1不會像文件2日期/時間的信息開始,轉換後,我將如何排序的日期合併後兩個?

任何幫助,將不勝感激!

+3

*我們幫助那些誰幫助自己*。你有什麼嘗試,請展示一些努力。 –

+0

@anunsh - 加入我到目前爲止的代碼。 – user9018

+0

可以用[日期時間::格式:: Strptime(https://metacpan.org/pod/DateTime::Format::Strptime)來完成。您需要2個函數[parse_datetime和format_datetime](https://metacpan.org/pod/DateTime::Format::Strptime#strptime-parse_datetime-string)函數。 –

回答

0

下面是使用Time::Piece的溶液。我使用Inline :: Files來模擬2個文件。你需要打開你的日誌文件,如

my $logfile1 = "catalina.out"; 
my $logfile2 = "access_log"; 


open my $log1_fh, '<', $logfile1 or die $1; 
open my $log2_fh, '<', $logfile2 or die $1; 

該程序看起來像這個,它給了我想你想要的結果。

#!/usr/bin/perl 
use strict; 
use warnings; 
use Inline::Files; 
use Time::Piece; 

my %data; 

while (<FILE2>) { 
    # get date_time 
    my ($dt) = /^(\d{4}-\d\d-\d\d \d\d:\d\d:\d\d),/ or next; 
    push @{ $data{$dt} }, $_; 
} 

my $format = '%d/%b/%Y:%H:%M:%S'; 

while (<FILE1>) { 
    /\[(\S+)/; 
    my $t = Time::Piece->strptime($1, $format) 
     or die "Cannot parse $1. $!"; 

    my $dt = $t->strftime('%Y-%m-%d %H:%M:%S'); 

    s/^\S+ (?:-)+//; 
    s/(?<=\[)[^\]]+/$dt/; 
    push @{ $data{$dt} }, $_; 
} 

for my $dt (sort keys %data) { 
    my $aref = $data{$dt}; 
    print for @$aref; 
} 


__FILE1__ 
127.0.0.1 - - [29/Feb/2016:16:57:52 -0600] "GET /application/wcs/api/version?nodeRef=workspace://SpacesStore/ecd62cfa-fd19-4d6b-b45d-14f0e5b92cf0 HTTP/1.1" 200 567 
127.0.0.1 - - [29/Feb/2016:16:57:52 -0600] "GET /application/wcs/api/node/workspace/SpacesStore/ecd62cfa-fd19-4d6b-b45d-14f0e5b92cf0/workflow-instances HTTP/1.1" 200 40 
127.0.0.1 - - [29/Feb/2016:16:57:52 -0600] "GET /application/wcs/cisco/appId?userId=abcdefg&requestType=get HTTP/1.1" 200 45 
173.37.239.93 - abcdefg [29/Feb/2016:16:57:52 -0600] "GET /share/page/site/nextgen-edcs/document-details?nodeRef=workspace://SpacesStore/ecd62cfa-fd19-4d6b-b45d-14f0e5b92cf0 HTTP/1.1" 200 124492 
173.37.239.93 - abcdefg [29/Feb/2016:16:57:53 -0600] "GET /share/service/messages_69bcdfdb058bb873ff49cc2a10c958b7.js?locale=en_US HTTP/1.1" 200 81698 
173.37.239.93 - abcdefg [29/Feb/2016:16:57:53 -0600] "GET /share/res/yui/history/history_543b42a00a378f4d4b6e70c81d915b0a.js HTTP/1.1" 200 5781 
__FILE2__ 
2016-02-12 08:16:03,630 WARN [cluster.cache.HazelcastSimpleCache] [http-bio-8443-exec-212] Cluster is inactive but put(k,v) was called for cache HazelcastSimpleCache[cacheName=cache.readersSharedCache] 
2016-02-12 08:16:03,630 WARN [cluster.cache.HazelcastSimpleCache] [http-bio-8443-exec-212] Cluster is inactive but get(key) was called for cache HazelcastSimpleCache[cacheName=cache.readersSharedCache], key=AclEntity[ ID=1893033, version=55, aclId=16cf5bc3-27d0-4d50-a93d-3bee1ddd112e, isLatest=true, aclVersion=1, inherits=true, inheritsFrom=1889292, type=1, inheritedAcl=1893034, isVersioned=false, requiresVersion=false, aclChangeSet=1451473] 
2016-02-12 08:16:03,630 WARN [cluster.cache.HazelcastSimpleCache] [http-bio-8443-exec-212] Cluster is inactive but put(k,v) was called for cache HazelcastSimpleCache[cacheName=cache.readersSharedCache] 

我用散列%data來存儲行。關鍵是轉換日期,以便在程序後面,您可以按排序順序打印它們。

從這個程序的輸出是:

2016-02-12 08:16:03,630 WARN [cluster.cache.HazelcastSimpleCache] [http-bio-8443-exec-212] Cluster is inactive but put(k,v) was called for cache HazelcastSimpleCache[cacheName=cache.readersSharedCache] 
2016-02-12 08:16:03,630 WARN [cluster.cache.HazelcastSimpleCache] [http-bio-8443-exec-212] Cluster is inactive but get(key) was called for cache HazelcastSimpleCache[cacheName=cache.readersSharedCache], key=AclEntity[ ID=1893033, version=55, aclId=16cf5bc3-27d0-4d50-a93d-3bee1ddd112e, isLatest=true, aclVersion=1, inherits=true, inheritsFrom=1889292, type=1, inheritedAcl=1893034, isVersioned=false, requiresVersion=false, aclChangeSet=1451473] 
2016-02-12 08:16:03,630 WARN [cluster.cache.HazelcastSimpleCache] [http-bio-8443-exec-212] Cluster is inactive but put(k,v) was called for cache HazelcastSimpleCache[cacheName=cache.readersSharedCache] 
[2016-02-29 16:57:52] "GET /application/wcs/api/version?nodeRef=workspace://SpacesStore/ecd62cfa-fd19-4d6b-b45d-14f0e5b92cf0 HTTP/1.1" 200 567 
[2016-02-29 16:57:52] "GET /application/wcs/api/node/workspace/SpacesStore/ecd62cfa-fd19-4d6b-b45d-14f0e5b92cf0/workflow-instances HTTP/1.1" 200 40 
[2016-02-29 16:57:52] "GET /application/wcs/cisco/appId?userId=abcdefg&requestType=get HTTP/1.1" 200 45 
abcdefg [2016-02-29 16:57:52] "GET /share/page/site/nextgen-edcs/document-details?nodeRef=workspace://SpacesStore/ecd62cfa-fd19-4d6b-b45d-14f0e5b92cf0 HTTP/1.1" 200 124492 
abcdefg [2016-02-29 16:57:53] "GET /share/service/messages_69bcdfdb058bb873ff49cc2a10c958b7.js?locale=en_US HTTP/1.1" 200 81698 
abcdefg [2016-02-29 16:57:53] "GET /share/res/yui/history/history_543b42a00a378f4d4b6e70c81d915b0a.js HTTP/1.1" 200 5781 
0

雖然我不會爲你寫劇本,一般腳本應該是這個樣子:

use strict; 
use warnings; 
use DateTime::Format::Strptime; 

sub firstFileLine { 
    # parse line as needed, and return a hash reference with 2 keys: 
    # 1. `line`: the contents of the line, possibly edited 
    # 2. `ts`: the UTC unix timestamp, via the DateTime::Format::Strptime module 
} 

sub secondFileLine { 
    # similar to `firstFileLine`, return a hash reference 
} 

my @firstLines = map { firstFileLine($_) } <FILE1>; 
my @secondLines = map { secondFileLine($_) } <FILE2>; 

my @sorted = map { $_->{line} } sort {$a->{ts} <=> $b->{ts}} (@firstLines, @secondLines); 

閱讀DateTime::Format::Strptimemap,並且sort的文檔。你很幸運,Perl是最好的文檔語言之一,充分利用這一事實!