獲取一個頁面的表格數據

我想回復一些類似於這個問題的其他網站的數據Getting data from another site with php via ID。獲取一個頁面的表格數據

表中有一行我想獲取並回顯，但無法使其回顯任何內容。

這是我的代碼，因爲我調整它來自上述問題的代碼，但它不起作用。

$content = "http://voucher.gov.gr/project/pedy-results/gid/14?search=PDNO-78256-114-20140722-120951"; 

$ch = curl_init(); 

    curl_setopt($ch, CURLOPT_URL, $content);   
    curl_setopt($ch, CURLOPT_NOBODY, false); 
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true); 
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); 
    curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false); 
    $body= curl_exec ($ch); 
    curl_close ($ch); 

    preg_match('#<tr class="row0"><td>([0-9\.]*)</td><td>([0-9\.]*)</td><td>([0-9\.]*)</td><td>([0-9\.]*)</td><td>([0-9\.]*)</td><td>([0-9\.]*)</td><td>([0-9\.]*)</td><td>([0-9\.]*)</td><td>([0-9\.]*)</td><td>([0-9\.]*)</td><td>([0-9\.]*)</td>#Uis', $body, $resultmatch); 

    $results = $resultmatch; 

    foreach($results as $word) 
    echo $word;

該數組雖然創建但沒有數據。任何幫助/建議，將不勝感激謝謝！

編輯解答：謝謝大家的幫助，但我設法使它工作！這是代碼：

preg_match('#<td>(.*)</td>(.*)<td>(.*)</td>(.*)<td style="max-width:151px;"><strong>(.*)</strong></td>(.*)<td>(.*)</td>(.*)<td>(.*)</td>(.*)<td>(.*)</td>(.*)<td>(.*)</td>(.*)<td>(.*)</td>(.*)<td>(.*)</td>(.*)<td>(.*)</td>#Uis', $body, $resultmatch);

此代碼是不是絕對正確的答案，因爲這不僅會返回內的信息對TD的我想，它也返回它們之間的白色空間，這就是因爲代碼不能沒有工作在td之間放置「（。*）」。因爲我不得不忍受它！但是，您可以通過忽略帶有空格的結果插入數組（在我們的例子中爲resultmatch[2,4,6,8,10...]等等）來避免它。我希望我的編輯幫助。當然可以進一步改進代碼以避免將空白插入到數組中。

來源

2014-09-30 fotis179

你調試的過程中的每一步？例如，你有什麼東西返回到'$ body'嗎？ – Raad 2014-09-30 09:27:14

@Raad yeap body正確返回整個頁面，我認爲問題存在於preg_match內，但我對錶達式不熟悉。 – fotis179 2014-09-30 09:36:08

如果我是正確的（RegExp不是我的強項之一），那麼匹配查找包含11個表格單元格的css類「row0」的行，而這些表格單元格只包含數字。所尋址的頁面在前5個單元格中包含非數字內容，因此不匹配。 – Raad 2014-09-30 09:48:29

確認捲曲正確返回頁面正文後，您的問題正確，因爲preg_match。

匹配查找包含11個表格單元的css類「row0」的行，其中每個表格都包含以數字開頭的內容，後面跟着任何內容([0-9\.]*)。

頁解決在第一5個細胞是在啓動非數字，因此沒有比賽的內容，因此如果要匹配這一行，你可以改變表達式：

'#<tr class="row0"><td>(.*)</td><td>(.*)</td><td>(.*)</td><td>(.*)</td><td>(.*)</td><td>([0-9\.]*)</td><td>([0-9\.]*)</td><td>([0-9\.]*)</td><td>([0-9\.]*)</td><td>([0-9\.]*)</td><td>([0-9\.]*)</td>#Uis'

正如我所說的在我的評論中，正則表達式並不是我強大的技能之一（因此我的評論中它有點不對），所以雖然我認爲這會起作用，但您可能需要調整它。

我覺得RegExp「小提琴」網站http://regex101.com/確實有用。

來源

2014-09-30 10:17:25 Raad

你的答案真的幫我做到了！謝謝！ – fotis179 2014-10-01 07:44:12

@ fotis179 - 如果您認爲我的答案有竅門，請點擊大白色的勾號=） – Raad 2014-10-01 08:14:35

如果您檢查$ body，則會出現大量不必要的空白和換行符，這些空白和換行符會阻止您的表達式找到匹配項。

爲了字母數字字符串匹配你需要的東西像「（。*？）\ U」

音符結束前U，允許符合Unicode字符。

所以我覺得這是你所需要的：

$content = "http://voucher.gov.gr/project/pedy-results/gid/14?search=PDNO-78256-114-20140722-120951"; 

$ch = curl_init(); 

curl_setopt($ch, CURLOPT_URL, $content);   
curl_setopt($ch, CURLOPT_NOBODY, false); 
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true); 
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); 
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false); 
$body= curl_exec ($ch); 
curl_close ($ch); 

//you need to strip whitespace and line breaks first 
$body = preg_replace('~>[\s|\r\n]+<~', '><', $body); 
$body = preg_replace('#\n(*?)#', '', $body); 
preg_match('#<tr class=\"row0\"><td>(.*?)</td><td>(.*?)</td><td(.*?)>(.*?)</td><td>(.*?)</td><td>(.*?)</td><td>(.*?)</td><td>(.*?)</td><td>(.*?)</td><td>(.*?)</td><td>(.*?)</td><td>(.*?)</td>#u', $body, $resultmatch); 
var_dump($resultmatch);

上述結果是這樣的：

array (size=13) 
    0 => string '<...>' (length=398) 
    1 => string 'Στερεάς Ελλάδας' (length=29) 
    2 => string 'Φθιώτιδας' (length=18) 
    3 => string ' style="max-width:151px;"' (length=25) 
    4 => string '<strong>PDNO-78256-114-20140722-120951</strong>' (length=47) 
    5 => string ' 
      22/07/2014         12:09:51        ' (length=99) 
    6 => string 'Επιλεχθείς' (length=20) 
    7 => string '30' (length=2) 
    8 => string '30' (length=2) 
    9 => string '30' (length=2) 
    10 => string '10' (length=2) 
    11 => string '100' (length=3) 
    12 => string '1        ' (length=33)

來源

2014-09-30 10:38:07 montexristos

真的很好的答案在這裏剝離白色空間。當我找到時間的時候會試試看！謝謝！ – fotis179 2014-10-01 07:42:49

我相信你不應該使用正則表達式來解析HTML元素。

使用DOM API將不太容易出錯。

你可以替換使用「的preg_match」行：

libxml_use_internal_errors(true); 
$domDocument = new DOMDocument(); 
$domDocument->loadHTML($body); 
$xpath = new DOMXPath($domDocument); 
$nodes = $xpath->query('//tr[@class="row0"][1]/td'); 

$results = array(); 
foreach($nodes as $node) { 
    $value = trim($node->nodeValue); 
    if(ctype_digit($value)) { 
     $results[] = $node->nodeValue; 
    } 
}

來源

2014-09-30 12:00:08

獲取一個頁面的表格數據

回答

相關問題