Preg_match_all <a href

Hello i want to extract links <a href="/portal/clients/show/entityId/2121" > and i want a regex which givs me /portal/clients/show/entityId/2121 the number at last 2121 is in other links different any idea?Preg_match_all <a href

來源

2009-10-05 streetparade

你想使用正則表達式從'/ portal/clients/show/entityId/2121'中提取'2121'嗎？ – halocursed 2009-10-05 12:11:00

不，我想提取'/門戶/客戶端/顯示/ entityId/2121' 另一個鏈接可以有不同的數字，而不是2121任何想法？ – streetparade 2009-10-05 12:13:19

正則表達式解析鏈接是這樣的：

'/<a\s+(?:[^"'>]+|"[^"]*"|'[^']*')*href=("[^"]+"|'[^']+'|[^<>\s]+)/i'

既然是多麼的可怕，我會建議使用Simple HTML Dom至少得到鏈接。然後你可以在鏈接href中使用一些非常基本的正則表達式來檢查鏈接。

來源

2009-10-05 12:20:40 Yacoby

這對我有用$ patterndocumentLinks ='/ ] + |」[^「] *」| \'[^ \'] * \'）* href =（「[[^「] +」 | \ '[^ \'] + \ '| [^ <> \ S] +）/ I';謝謝 – streetparade 2009-10-05 12:25:43

@streetparade您可能希望避免在捕獲的值中包含引用屬性值的引號，因此，請相應地調整正則表達式捕獲相關： '/ ] + | 「[^」] * 「| \ '[^ \'] * \'）* HREF = 」（[^「] +）」 | \ '[^ \'] + \'| [^ <> \ s]的+/I」 – 2014-08-28 16:56:32

Paring links from HTML can be done using am HTML parser.

When you have all links, simple get the index of the last forward slash, and you have your number. No regex needed.

來源

2009-10-05 12:10:53

hmm .. $ html-> find（'href'）還是什麼？ – streetparade 2009-10-05 12:11:52

我不知道。這個發現（...）是從哪裏來的？ – 2009-10-05 12:42:36

Simple PHP HTML Dom Parser例如：

// Create DOM from string 
$html = str_get_html($links); 

//or 
$html = file_get_html('www.example.com'); 

foreach($html->find('a') as $link) { 
    echo $link->href . '<br />'; 
}

來源

2009-10-05 12:19:33 karim79

這會給結果「 – streetparade 2009-10-05 12:26:21

但我只是提取/門戶/客戶端/顯示/ entityId/4636所以這工作 '/ ] + |」[^「] *」|'[^'] *' ）* href =（「[^」] +「|'[^'] +'| [^ <> \ s] +）/ i' – streetparade 2009-10-05 12:26:57

@streetparade my bad，忘記說$ link-> href，編輯 – karim79 2009-10-05 12:30:13

當「解析」HTML我主要依靠PHPQuery：http://code.google.com/p/phpquery/，而不是正則表達式。

來源

2009-10-05 12:24:58 Max

Don't use regular expressions for proccessing xml/html。這可以很容易地使用來完成的builtin dom parser：

$doc = new DOMDocument(); 
$doc->loadHTML($htmlAsString); 
$xpath = new DOMXPath($doc); 
$nodeList = $xpath->query('//a/@href'); 
for ($i = 0; $i < $nodeList->length; $i++) { 
    # Xpath query for attributes gives a NodeList containing DOMAttr objects. 
    # http://php.net/manual/en/class.domattr.php 
    echo $nodeList->item($i)->value . "<br/>\n"; 
}

來源

2009-10-05 12:28:57 soulmerge

這是我的解決方案：

<?php 
// get links 
$website = file_get_contents("http://www.example.com"); // download contents of www.example.com 
preg_match_all("<a href=\x22(.+?)\x22>", $website, $matches); // save all links \x22 = " 

// delete redundant parts 
$matches = str_replace("a href=", "", $matches); // remove a href= 
$matches = str_replace("\"", "", $matches); // remove " 

// output all matches 
print_r($matches[1]); 
?>

我建議避免使用基於XML解析器，因爲你不會總是知道，文檔是否/網站已經形成良好。

祝你好運

來源

2013-10-29 23:01:34 GotIt

Preg_match_all <a href

回答

相關問題