函數的困難preg_match_all

我想回到跨度HTML標籤之間的數字。這個數字可能會改變！函數的困難preg_match_all

<span class="topic-count"> 
    ::before 
    " 
      24 
      " 
    ::after 
</span>

我試過下面的代碼：

preg_match_all("#<span class=\"topic-count\">(.*?)</span>#", $source, $nombre[$i]);

但它不工作。

整個代碼：

$result=array(); 
$page = 201; 
while ($page>=1) { 
    $source = file_get_contents ("http://www.jeuxvideo.com/forums/0-27047-0-1-0-".$page."-0-counter-strike-global-offensive.htm"); 
    preg_match_all("#<span class=\"topic-count\">(.*?)</span>#", $source, $nombre[$i]); 
    $result = array_merge($result, $nombre[$i][1]); 
    print("Page : ".$page ."\n"); 
    $page-=25; 
} 
print_r ($nombre);

來源

2017-02-09 Diamonds

不要使用REGEX進行HTML PARSING！首先得到你的跨度值，然後使用正則表達式......！ – Random

添加s修飾符，使點也匹配換行符。編輯：+1隨機說的。 ;） – Connum

另外，如果你只想要一個數字匹配\ d + – Gordon

可以做

preg_match_all(
    '#<span class="topic-count">[^\d]*(\d+)[^\d]*?</span>#s', 
    $html, 
    $matches 
);

這將跨度結束之前捕獲任何數字。

但是，請注意，這個正則表達式只適用於這片html。如果標記中存在細微變化，例如另一個類或另一個屬性，則該模式將不再起作用。爲HTML編寫可靠的正則表達式很困難。

因此，改爲例如use a DOM parser的建議。

libxml_use_internal_errors(true); 
$dom = new DOMDocument; 
$dom->loadHTMLFile('http://www.jeuxvideo.com/forums/0-27047-0-1-0-1-0-counter-strike-global-offensive.htm'); 
libxml_use_internal_errors(false); 

$xpath = new DOMXPath($dom); 
foreach ($xpath->evaluate('//span[contains(@class, "topic-count")]') as $node) { 
    if (preg_match_all('#\d+#s', $node->nodeValue, $topics)) { 
     echo $topics[0][0], PHP_EOL; 
    } 
}

DOM will parse the entire page into a tree of nodes，您可以通過XPath方便地進行查詢。注意表達式

//span[contains(@class, "topic-count")]

它會給你帶有包含字符串topic-count的類屬性的所有span元素。然後如果這些節點中的任何一個包含一個數字，則回顯它。

來源

2017-02-09 09:47:44 Gordon

謝謝，它完美的作品。我也會嘗試使用DOM解析器！而且@戈登能否告訴我[^ \ d] *意味着什麼？ – Diamonds

@Diamonds []表示一個字符組。這意味着匹配組內的任何東西。 A ^在開始意味着否定組，因此不匹配組內的任何內容，所以[^ \ d] *表示不匹配任何數字。請參閱'https://regexper.com/#%5B%5E%5Cd%5D*（％5Cd％2B）％5B％5E％5Cd％5D *％3F'。另外考慮https://regexone.com – Gordon

謝謝，非常有用的工具！ – Diamonds

函數的困難preg_match_all

回答

相關問題