提取特定<a href> URLs out of the document

I think this should be elementary, but I still can't get my head around it. Let's say there's fair amount of HTML documents and I need to catch every image URLs out of them.提取特定<a href> URLs out of the document

The rest of the content changes, but the base of the url is always the same for example: http://images.examplesite.com/images/,

So I want to extract every string that contains that part. the problem is that they're always mixed with <a href=''> or <img src=''> tags, so how could I drop them out? preg_match probably?

來源

2010-07-20 Seerumi

可能重複[PHP的Xpath：獲得包含針所有的href值（http://stackoverflow.com/questions/2392393/php-xpath-get-all-href-values-that-contain-針） – Gordon 2010-07-20 07:42:52

您也可以使用[Preg_Match All A href]中所示的DOM（http://stackoverflow.com/questions/1519696/preg-match-all-a-href/1519791#1519791）。只需將XPath更改爲鏈接副本中給出的XPath即可。 – Gordon 2010-07-20 07:48:23

我會試試:) – Seerumi 2010-07-20 07:59:17

Try something like: preg_match_all('/http:\/\/images\.examplesite\.com\/images\/(.*?)"/i', $html_data, $results, PREG_SET_ORDER)

來源

2010-07-20 07:40:20

哇，那很快。它留下了一個「，但是不管信不信，我自己也清除了它; D再次感謝！ – Seerumi 2010-07-20 07:57:56

You can either use html dom parser

或使用正則表達式。

preg_match_all("/http:\/\/images.examplesite.com\/images\/(.*?)\"/s", $str, $preg); 
    print_r($preg);

的

來源

2010-07-20 07:43:40 marvin

提取特定<a href> URLs out of the document

回答

相關問題