的preg_match問題

我試圖抓住數值（即105），請檢查我的html代碼如下...的preg_match問題

<p> 
       External Backlinks 
      </p> 
      <p style="font-size: 150%;"> 
       <b>105</b> 
      </p>

，我已經使用正則表達式如下...

$url = 'http://www.example.com/test.html'; 

preg_match('#<p>External Backlinks</p><p style="font-size: 150%;"><b>([0-9\.]+)#', file_get_contents($url), $matches); 

echo $matches[1];

但它沒有返回正確的值，請幫助修復上述正則表達式。謝謝。

來源

2012-02-21 seoppc

http://stackoverflow.com/a/1732454/1163867 – MarcinJuraszek 2012-02-21 21:32:39

對於HTML，請勿使用* regex *，使用* xpath * 。 Xpath是HTML/XML的「常規」表達式，例如'''p [@ style =「font-size：150％;」]/b'。 – hakre 2012-02-21 22:04:35

我不推薦使用正則表達式來解析HTML。改爲使用DOM parser。 Read this rant for more information about why :)

回答你的問題。下面是你的榜樣工作正則表達式：

<p>[^E]*External Backlinks[^<]*<\/p>[^<]*<p style="font-size: ?150%;">[^<]*<b>(\d+)<\/b>[^<]*<\/p>

這是醜陋的，但它的作品...... 不要使用它。

preg_match('#<p>[^E]*External Backlinks[^<]*<\/p>[^<]*<p style="font-size: ?150%;">[^<]*<b>(\d+)<\/b>[^<]*<\/p>#', file_get_contents($url), $matches); 

echo $matches[1];

輸出：

與您正則表達式的問題是，它沒有考慮在HTML源代碼的空格，而且你也沒有逃脫你的斜槓。

如果源看起來是這樣的：

<p>External Backlinks</p><p style="font-size: 150%;"><b>105</b></p>

此致會工作，但不是非常穩健。（我想可以使用正則表達式來解析HTML從來沒有非常強大。）

來源

2012-02-21 21:45:56 ohaal

回答

相關問題