正則表達式來選擇HTML中的部分

我有要求從滾動的HTML源代碼中提取元屬性。滾動HTML代碼後包含如下正則表達式來選擇HTML中的部分

例子：

<meta property="og:site_name" content="asasasas"> 
<meta property="og:title" content="asajhskajhsaksp;" /> 
<meta property="og:image" content="images.cxs.com/2014/09/modit1.gif?w=209" />

在這裏，我想的只有在元property="og:image"即結果應該是內容僅供

images.cxs.com /2014/09/modit1.gif?w=209

來源

2014-10-07 Kiran

[不要用正則表達式解析HTML]（http://stackoverflow.com/a/1732454/418066） – Biffen 2014-10-07 06:22:31

@Biffen：使用正則表達式來處理這類任務有什麼問題？沒有遞歸或正則表達式無法處理的任何東西。 – 2014-10-07 06:49:23

@ Rawing-HTML不是一種常規語言，它不能用正則表達式可靠地解析，儘管您可能使用正則表達式來標記HTML解析器中的輸入。 – RobG 2014-10-07 06:52:07

正如@Biffen所說，不要用正則表達式來解析html。

如果你有一個變量表示字符串可以使用querySelector（）之類

var html = '<meta property="og:site_name" content="asasasas" /><meta property="og:title" content="asajhskajhsaksp;" /><meta property="og:image" content="images.cxs.com/2014/09/modit1.gif?w=209" />'; 
 
var el = document.createElement('div'); 
 
el.innerHTML = html; 
 
var meta = el.querySelector('meta[property="og:image"]'); 
 
console.log(meta.content); 
 

 
document.getElementById('result').innerHTML = meta.content;

<div id="result"></div>

如果當前頁面的一部分，那麼

var meta = document.querySelector('meta[property="og:image"]'); 
 
console.log(meta.content); 
 

 
document.getElementById('result').innerHTML = meta.content;

<meta property="og:site_name" content="asasasas"/> 
 
<meta property="og:title" content="asajhskajhsaksp;" /> 
 
<meta property="og:image" content="images.cxs.com/2014/09/modit1.gif?w=209" /> 
 

 
<div id="result"></div>

來源

2014-10-07 06:33:23

Hi @Arun，我使用CURL先抓取網站並將其存儲在文件中。 $ ch = curl_init（$ url）; \ n「）; $ fp = fopen（$ file，」w「）or die（」Unable to open「。$ file。」for writing。\ n「）; curl_setopt（$ ch，CURLOPT_FILE，$ fp）; curl_close（$ ch）; fclose（$ fp）;現在我在那個文件中有HTML代碼..所以接下來我可以繼續前進，比如你有如何建議上面的權利..或者有任何其他方式來獲得網站的內容除了捲曲。 Bcoz CURL爬行整個頁面，但我只想HTML的HEAD部分.. – Kiran 2014-10-07 07:20:16

是不是很難使用jquery

$('meta[property="og:image"]').attr('content')

來源

2014-10-07 06:23:15 aelor

OP中沒有jQuery標記或提及它。 – RobG 2014-10-07 06:50:25

有人提到javascript，所以我認爲jQuery解決方案也可能足夠 – aelor 2014-10-07 07:00:20

您可以使用Arun建議的方法，但可能有用戶代理不支持Selectors API或不支持所需功能（例如， IE8）。在這種情況下，您可以使用getElementsByTagName和普通的舊for循環。

var node, nodes = document.getElementsByTagName('meta'); 
for (var i=0, iLen=nodes.length; i<iLen; i++) { 
    node = nodes[i]; 

    if (node.getAttribute('property') == 'og:image') { 

    // do something with content 
    console.log(node.content); 
    } 
}

以上將在任何使用的瀏覽器中工作，並且不需要任何外部庫。

來源

2014-10-07 07:00:59 RobG

正則表達式來選擇HTML中的部分

回答

相關問題