好的,這是場景:我需要解析我的日誌以查找在沒有實際觀看「大圖片」頁面的情況下下載了多少次圖像縮略圖... 這基本上是一個熱鏈接基於「拇指」與「完整」圖像視圖比例的保護系統在PHP中高效解析Apache日誌
考慮到服務器不斷遭到縮略圖請求的轟炸,最有效的解決方案似乎使用緩衝的apache日誌,每寫一次,1Mb,然後定期解析日誌
我的問題是這樣的:我如何解析PHP中的apache日誌來保存數據,以下是對的:
- 的日誌將被使用,並實時更新,我需要我的PHP腳本能夠閱讀它,而這麼做是
- PHP腳本會「記住」它的零件記錄它讀取,以免兩次讀取相同的部分和歪斜數據
- 內存消耗量應在最低限度,因爲日誌可以很容易地在幾個小時內達到10GB的數據
的PHP記錄器腳本將每60秒調用一次,並在此期間處理任何數量的日誌行..
我已經試過黑客一些代碼在一起,但我一直在使用的內存的最小量,找到一個方法來跟蹤指針的一個「移動」文件大小
這裏的問題是日誌的一部分:
212.180.168.244 - - [18/Jan/2012:20:06:57 +0100] "GET /t/0/11/11441/11441268.jpg HTTP/1.1" 200 3072 "-" "Opera/9.80 (Windows NT 6.1; U; pl) Presto/2.10.229 Version/11.60" "-"
122.53.168.123 - - [18/Jan/2012:20:06:57 +0100] "GET /t/0/11/11441/11441276.jpg HTTP/1.1" 200 3007 "-" "Opera/9.80 (Windows NT 6.1; U; pl) Presto/2.10.229 Version/11.60" "-"
143.22.203.211 - - [18/Jan/2012:20:06:57 +0100] "GET /t/0/11/11441/11441282.jpg HTTP/1.1" 200 4670 "-" "Opera/9.80 (Windows NT 6.1; U; pl) Presto/2.10.229 Version/11.60" "-"
附加的代碼在這裏您的評論:
<?php
//limit for running it every minute
error_reporting(E_ALL);
ini_set('display_errors',1);
set_time_limit(0);
include(dirname(__FILE__).'/../kframework/kcore.class.php');
$aj = new kajaxpage;
$aj->use_db=1;
$aj->init();
$db=kdbhandler::getInstance();
$d=kdebug::getInstance();
$d->debug=TRUE;
$d->verbose=TRUE;
$log_file = "/var/log/nginx/access.log"; //full path to log file when run by cron
$pid_file = dirname(__FILE__)."/../kframework/cron/cron_log.pid";
//$images_id = array("8308086", "7485151", "6666231", "8343336");
if (file_exists($pid_file)) {
$pid = file_get_contents($pid_file);
$temp = explode(" ", $pid);
$pid_timestamp = $temp[0];
$now_timestamp = strtotime("now");
//if (($now_timestamp - $pid_timestamp) < 90) return;
$pointer = $temp[1];
if ($pointer > filesize($log_file)) $pointer = 0;
}
else $pointer = 0;
$pattern = "/([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3})[^\[]*\[([^\]]*)\][^\"]*\"([^\"]*)\"\s([0-9]*)\s([0-9]*)(.*)/";
$last_time = 0;
$lines_processed=0;
if ($fp = fopen($log_file, "r+")) {
fseek($fp, $pointer);
while (!feof($fp)) {
//if ($lines_processed>100) exit;
$lines_processed++;
$log_line = trim(fgets($fp));
if (!empty($log_line)) {
preg_match_all($pattern, $log_line, $matches);
//print_r($matches);
$size = $matches[5][0];
$matches[3][0] = str_replace("GET ", "", $matches[3][0]);
$matches[3][0] = str_replace("HTTP/1.1", "", $matches[3][0]);
$matches[3][0] = str_replace(".jpg/", ".jpg", $matches[3][0]);
if (substr($matches[3][0],0,3) == "/t/") {
$get = explode("-",end(explode("/",$matches[3][0])));
$imgid = $get[0];
$type='thumb';
}
elseif (substr($matches[3][0], 0, 5) == "/img/") {
$get1 = explode("/", $matches[3][0]);
$get2 = explode("-", $get1[2]);
$imgid = $get2[0];
$type='raw';
}
echo $matches[3][0];
// put here your sql insert or update
$imgid=(int) $imgid;
if (isset($type) && $imgid!=1) {
switch ($type) {
case 'thumb':
//use the second slave in the registry
$sql=$db->slave_query("INSERT INTO hotlink SET thumbviews=1, imageid=".$imgid." ON DUPLICATE KEY UPDATE thumbviews=thumbviews+1 ",2);
echo "INSERT INTO hotlink SET thumbviews=1, imageid=".$imgid." ON DUPLICATE KEY UPDATE thumbviews=thumbviews+1";
break;
case 'raw':
//use the second slave in the registry
$sql=$db->slave_query("INSERT INTO hotlink SET rawviews=1, imageid=".$imgid." ON DUPLICATE KEY UPDATE rawviews=rawviews+1",2);
echo "INSERT INTO hotlink SET rawviews=1, imageid=".$imgid." ON DUPLICATE KEY UPDATE rawviews=rawviews+1";
break;
}
}
// $imgid - image ID
// $size - image size
$timestamp = strtotime("now");
if (($timestamp - $last_time) > 30) {
file_put_contents($pid_file, $timestamp . " " . ftell($fp));
$last_time = $timestamp;
}
}
}
file_put_contents($pid_file, (strtotime("now") - 95) . " " . ftell($fp));
fclose($fp);
}
?>
他在幾個小時內就說了10G的數據。在總結我真正需要的內容之前,絕對不是我想要的MySQL。全文索引(暗示MyISAM)就像這樣的數據將是一場災難。 – Evert 2012-01-18 19:16:14
@Evert:但是iit從0字節的日誌文件開始?看到我的答案。 – Bytemain 2012-01-18 19:25:18
它不以一個空的日誌開始......它以幾十GB的數據開始:/我發佈的腳本超時出現內存分配錯誤,所以我認爲必須在某處發生泄漏,我可以' t似乎找到它...我的印象是使用fgets只會保持當前行在內存..是「pid」文件的想法,以跟蹤指針任何好? – Igor 2012-01-18 19:36:25