2010-05-11 90 views
2

我需要一些php代碼來解析原始apache日誌。 特別是,我想要多少次mode = search和用於搜索的術語。這裏有一個例子:解析原始apache日誌

207.46.195.228 - - [30/Apr/2010:03:24:26 -0700] "GET /index.php?mode=search&term=AE1008787E0174 HTTP/1.1" 200 13047 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)" 
212.81.200.167 - - [30/Apr/2010:04:21:43 -0700] "GET /index.php?mode=search&term=WH2002D-YYH HTTP/1.1" 200 12079 "http://www.mysite.com/SearchGBY.php?page=81" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; GTB6.4; .NET CLR 1.1.4322; .NET CLR 2.0.50727; WinuE v6; InfoPath.2; WinuE v6)" 
212.81.200.167 - - [30/Apr/2010:04:21:44 -0700] "GET /file_uploads/banners/banner.swf HTTP/1.1" 200 50487 "-" "contype" 
66.249.68.168 - - [30/Apr/2010:04:21:45 -0700] "GET /index.php?mode=search&term=WH2002D-YYH HTTP/1.1" 200 12079 "-" "Mediapartners-Google" 

回答

3

我最近寫了這是一個非常粗糙的解析器:

 

$ignore = array('css', 'png', 'gif', 'jpg', 'jpeg', 'js', 'ico'); 

$f = fopen('access_log', "r"); 
if(!$f) die("Failed to open log for reading."); 

while (!feof($f)) { 

    $buff = fgets($f, 4096); 

    $parts = explode(' ', $buff); 

    if(in_array(end(explode('.', $parts[6])), $ignore)) continue; 

    $domain = trim(end($parts)); 

    // http method 
    $http_method = substr($parts[5], 1); 
    if($http_method != 'GET' && $http_method != 'POST') continue; 

    // parse out the date 
    list($d, $m, $y) = explode('/', substr($parts[3], 1)); 
    $y = substr($y, 0, 4); 
    $time = strtotime("{$d} {$m} {$y}"); 

    print "{$time} {$parts[0]} {$http_method} {$parts[6]} $domain\n"; 
} 
 

$零件[6]應包含部分你對(被訪問的資源)感興趣。這應該讓你的方式...

0

一樣方便使用正則表達式:http://php.net/manual/en/book.regex.php

+0

對不起,Casidiablo,正則表達式太混亂了我。我已經編程了20年,*仍然*不知道或喜歡它們。 – MB34 2010-05-11 20:46:19

+0

大聲笑...我也是。但它們很有用。 – Cristian 2010-05-11 21:40:07

+0

正則表達式在大多數情況下是解析字符串的錯誤工具,因爲要解析的語法通常不是常規的。你是否首先檢查Apache日誌的語法? – erikbwork 2012-08-06 10:30:25