php
  • domdocument
  • 2016-11-20 127 views 0 likes 
    0

    我提取所有鏈接包含在頁面上的錨或alt屬性的圖像包括在鏈接中,如果這首先來。php DOMDocument提取鏈接與錨或alt

    $html = '<a href="lien.fr">Anchor</a>'; 
    

    必須返回 「lien.fr錨」

    $html = '<a href="lien.fr"><img alt="Alt Anchor">Anchor</a>'; 
    

    必須返回 「lien.fr; Alt鍵錨」

    $html = '<a href="lien.fr">Anchor<img alt="Alt Anchor"></a>'; 
    

    必須返回 「lien.fr錨」

    我做了:

    $doc = new DOMDocument(); 
    $doc->loadHTML($html); 
    
    $out = ""; 
    $n = 0; 
    $links = $doc->getElementsByTagName('a'); 
    
    foreach ($links as $element) { 
        $href = $img_alt = $anchor = ""; 
        $href = $element->getAttribute('href'); 
        $n++; 
        if (!strrpos($href, "panier?")) { 
    
         if ($element->firstChild->nodeName == "img") { 
    
          $imgs = $element->getElementsByTagName('img'); 
    
          foreach ($imgs as $img) { 
           if ($anchor = $img->getAttribute('alt')) { 
            break; 
           } 
          } 
         } 
    
         if (($anchor == "") && ($element->nodeValue)) { 
          $anchor = $element->nodeValue; 
         } 
    
         $out[$n]['link'] = $href; 
         $out[$n]['anchor'] = $anchor; 
        } 
    } 
    

    這似乎是工作,但如果有一些空間,或縮進它不 作爲

    $html = '<a href="link.fr"> 
            <img src="ceinture-gris" alt="alt anchor"/> 
           </a>'; 
    

    在$元素 - > firstChild->節點名稱將文本

    回答

    0

    事情是這樣的:

    $doc = new DOMDocument(); 
    $doc->loadHTML($html); 
    
    // Output texts that will later be joined with ';' 
    $out = []; 
    // Maximum number of items to add to $out 
    $max_out_items = 2; 
    // List of img tag attributes that will be parsed by the loop below 
    // (in the order specified in this array!) 
    $img_attributes = ['alt', 'src', 'title']; 
    
    $links = $doc->getElementsByTagName('a'); 
    foreach ($links as $element) { 
        if ($href = trim($element->getAttribute('href'))) { 
        $out []= $href; 
        if (count($out) >= $max_out_items) 
         break; 
        } 
    
        foreach ($element->childNodes as $child) { 
        if ($child->nodeType === XML_TEXT_NODE && 
         $text = trim($child->nodeValue)) 
        { 
         $out []= $text; 
         if (count($out) >= $max_out_items) 
         break; 
        } elseif ($child->nodeName == 'img') { 
         foreach ($img_attributes as $attr_name) { 
         if ($attr_value = trim($child->getAttribute($attr_name))) { 
          $out []= $attr_value; 
          if (count($out) >= $max_out_items) 
          goto Result; 
         } 
         } 
        } 
        } 
    } 
    
    Result: 
    echo $out = implode(';', $out); 
    
    相關問題