伯爵HTML鏈接，並在一個字符串<em>$ HTML</em>添加列表

我存儲網站的內容。

我想計數鏈接到一個文件中雜項文件格式所有的HTML鏈接，添加一這些鏈接的列表的$ HTML結束和刪除原來的鏈接。

一個例子：

<?php 
$html_input = ' 
<p> 
    Lorem <a href="font-1.otf">ipsum</a> dolor sit amet, 
    consectetur <a href="http://www.cnn.com">adipiscing</a> elit. 
    Quisque <a href="font-2.otf">ultricies</a> placerat massa 
    vel dictum. 
</p>' 

// some magic here  

$html_output = ' 
<p> 
    Lorem ipsum dolor sit amet, 
    consectetur <a href="http://www.cnn.com">adipiscing</a> elit. 
    Quisque ultricies placerat massa 
    vel dictum. 
</p> 
<p>.otf-links: 2</p> 
<ul> 
    <li><a href="font-1.otf">ipsum</a></li> 
    <li><a href="font-2.otf">ultricies</a></li> 
</ul>' 
?>

我該怎麼辦呢？我應該使用正則表達式，還是有另一種方式？

來源

2010-02-19 snorpey

不，你不應該用戶正則表達式。請參閱：http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 一個真正的答案即將 – 2010-02-19 11:06:24

require_once("simple_html_dom.php"); 

$doc = new simple_html_dom(); 
$doc->load($input_html); 

$fonts = array(); 
$links = $doc->find("a"); 

foreach ($links as $l) { 
    if (substr($l->href, -4) == ".otf") { 
     $fonts[]  = $l->outertext; 
     $l->outertext = $l->innertext; 
    } 
} 

$output = $doc->save() . "\n<p>.otf-links: " . count($fonts) ."</p>\n" . 
    "<ul>\n\t<li>" . implode("</li>\n\t<li>", $fonts) . "</li>\n</ul>";

Documenation爲簡單的HTML DOM http://simplehtmldom.sourceforge.net/

來源

2010-02-19 11:17:41

+1。少扔在一起比我的。修正了一個可能導致如果在href的長度小於4 – Yacoby 2010-02-19 11:32:15

感謝您的努力腳本失敗的問題。這幾乎是我想要的，除了它也刪除列表中的ancor標籤。交換_ $ l-> outertext = $ 1--> innertext; _和_ $ fonts [] = $ l; _沒有幫助，那麼我該如何解決這個問題？ – snorpey 2010-02-19 14:00:07

@Yacoby謝謝隊友;然而，即使字符串長度爲0，'substr'也會很快地繼續而沒有錯誤，所以檢查是沒有必要的。 @snorpey我解決了這個問題。請記住，PHP中的對象是通過引用來分配的，除非您明確地克隆它們。解決的辦法是在改變之前將錨對象的實際字符串表示賦給'$ fonts []'。 – 2010-02-19 18:42:20

使用DOM Parser

例子：

$h = str_get_html($html); 

$linkCount = count($h->find('a')); 

foreach ($h->find('a') as $a){ 
    //print every link ending in .odf 
    if (ends_with(strtolower($a->href), '.odf')){ //ends with isn't a function, but it is trivial to write 

     echo '<li><a href="'.$a->href.'">'.$a->innertext.'</a></li>'; 
    } 
}

來源

2010-02-19 11:04:13 Yacoby

+1推薦DOM解析器 – marcgg 2010-02-19 11:15:02

我喜歡簡單的html dom！你打敗了我，但你忽略了關於從原始輸入中刪除錨標籤的部分。 – 2010-02-19 11:23:14

-1

preg_match('~<a href="[^"]+\.otf">.*?</a>~s', $html_input, $matches); 
$linksCount = count($matches[0]); 
preg_replace('~<a href="[^"]+\.otf">.*?</a>~s', '', $html_input); 
$html_input.='<ul><li>'.implode('</li><li>', $matches[0]).'</li></ul>';

來源

2010-02-19 11:07:45

我們都知道，如果你使用正則表達式解析HTML會發生什麼...... http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags – marcgg 2010-02-19 11:14:46

我甚至貼出了對OP的警告評論。例如， – 2010-02-19 11:23:36

伯爵HTML鏈接，並在一個字符串<em>$ HTML</em>添加列表

回答

相關問題