2017-02-28 73 views
0

我有內容類似下面的HAProxy的日誌文件:添加另一列到AWK輸出

Feb 28 11:16:10 localhost haproxy[20072]: 88.88.88.88:6152 [28/Feb/2017:11:16:01.220] frontend backend_srvs/srv1 9063/0/0/39/9102 200 694 - - --VN 9984/5492/191/44/0 0/0 {Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36|http://subdomain.domain.com/location1} "GET /location1 HTTP/1.1" 
Feb 28 11:16:10 localhost haproxy[20072]: 88.88.88.88:6152 [28/Feb/2017:11:16:10.322] frontend backend_srvs/srv1 513/0/0/124/637 200 14381 - - --VN 9970/5491/223/55/0 0/0 {Mozilla/5.0 AppleWebKit/537.36 Chrome/56.0.2924.87 Safari/537.36|http://subdomain.domain.com/location2} "GET /location2 HTTP/1.1" 
Feb 28 11:16:13 localhost haproxy[20072]: 88.88.88.88:6152 [28/Feb/2017:11:16:10.960] frontend backend_srvs/srv1 2245/0/0/3/2248 200 7448 - - --VN 9998/5522/263/54/0 0/0 {another user agent with fewer columns|http://subdomain.domain.com/location3} "GET /location3 HTTP/1.1" 
Feb 28 11:16:13 localhost haproxy[20072]: 88.88.88.88:6152 [28/Feb/2017:11:16:10.960] frontend backend_srvs/srv1 2245/0/0/3/2248 200 7448 - - --VN 9998/5522/263/54/0 0/0 {Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36|} "GET /another_location HTTP/1.1" 

我想提取一些字段,以便有以下輸出:

Field 1    Field 2   Field 3   Field 4   Field 5   Field 6 
Date/time  HTTP status code  HTTP Method  Request  HTTP version Referer URL 

基本上,在這種特殊情況下,輸出應爲:

Feb 28 11:16:10 200 GET /location1 HTTP/1.1 http://subdomain.domain.com/location1 
Feb 28 11:16:10 200 GET /location2 HTTP/1.1 http://subdomain.domain.com/location2 
Feb 28 11:16:13 200 GET /location3 HTTP/1.1 http://subdomain.domain.com/location3 
Feb 28 11:16:13 200 GET /another_location HTTP/1.1 

唯一的問題這裏提取Referer的網址它與用戶代理一起位於大括號之間,並由管道分隔。此外,用戶代理具有可變數量的字段。

我能想到的唯一的解決辦法是單獨提取引薦URL,然後粘貼列在一起:

requests_temp=`grep -F " 88.88.88.88:" /root/file.log | tr -d '"'` 
requests=`echo "${requests_temp}" | awk '{print $1" "$2" "$3" "$11, $(NF-2), $(NF-1), $NF}' > /tmp/requests_tmp` 
referer_url=`echo "${requests_temp}" | awk 'NR > 1 {print $1}' RS='{' FS='}' | awk -F'|' '{ print $2 }' > /tmp/referer_url_tmp` 

paste /tmp/abuse_requests_tmp /tmp/referer_url_tmp 

但我真的不喜歡這種方法。有沒有其他方法可以使用只有一個awk行呢?也許將referer url列賦給awk中的一個變量,然後使用它來創建相同的輸出?

回答

1

嘗試以下解決方案 -

awk '/88.88.88.88/ {gsub(/"/,"",$0);split($(NF-3),a,"|"); {print $1,$2,$3,$11, $(NF-2), $(NF-1), $NF, substr(a[2],1,(length(a[2])-1))}}' a 
Feb 28 11:16:10 200 GET /location1 HTTP/1.1 http://subdomain.domain.com/location1 
Feb 28 11:16:10 200 GET /location2 HTTP/1.1 http://subdomain.domain.com/location2 
Feb 28 11:16:13 200 GET /location3 HTTP/1.1 http://subdomain.domain.com/location3 
Feb 28 11:16:13 200 GET /another_location HTTP/1.1 
1

可以一次使用awk做所有:

awk '$6 ~ /88\.88\.88\.88:[0-9]+/{ 
    split($0,a,/[{}]/) 
    $0=a[1] OFS a[3] 
    split(a[2],b,"|") 
    print $1,$2,$3,$11,substr($18,2),$19,substr($20,1,length($20)-1),b[2] 
}' file.log 

第一split被分割線的可變部分({...}之間包括在其中)到陣列a

線被重建,以便具有字段$0=a[1] OFS a[3]

第二split的固定數目允許提取可變基於|字符的URL。

最後print顯示所有需要的元素。請注意0​​這裏用於刪除"

+0

你忘了添加過濾器的IP地址(88.88.88.88),如果我的文件有不同的IP,這也將獲得與您的解決方案打印一個更大的價值。 –

+0

@VIPINKUMAR我剛剛添加了這個條件,謝謝... – oliv