從RSS提要解析只是IMG SRC麻煩？

我試圖創建一個基於這個例子的RSS閱讀器：從RSS提要解析只是IMG SRC麻煩？

http://www.w3schools.com/php/php_ajax_rss_reader.asp

具體來說，我試圖修改此示例，使讀者可以訪問和顯示所有可用的漫畫圖像（沒有別的）從任何給定的網絡漫畫RSS提要。我意識到可能有必要使代碼至少有點特定於站點，但我正在儘可能將其作爲通用目標。目前，我已經修改了最初的示例，以生成一個顯示給定RSS源列表的所有漫畫的閱讀器。但是，它也顯示了我試圖擺脫的其他不需要的文本信息。這裏是我的代碼，到目前爲止，與那些給我找麻煩特別是一些供稿：

index.php文件：

<html> 
<head> 
    <script> 
     function showRSS() 
     { 
      if (window.XMLHttpRequest) 
      { 
      // code for IE7+, Firefox, Chrome, Opera, Safari 
      xmlhttp=new XMLHttpRequest(); 
      } else 
      { // code for IE6, IE5 
      xmlhttp=new ActiveXObject("Microsoft.XMLHTTP"); 
      } 
      xmlhttp.onreadystatechange=function() 
      { 
      if (xmlhttp.readyState==4 && xmlhttp.status==200) 
      { 
       document.getElementById("rssOutput").innerHTML=xmlhttp.responseText; 
      } 
      } 
      xmlhttp.open("GET","logger.php",true); 
      xmlhttp.send(); 
     } 
    </script> 
</head> 
<body onload="showRSS()"> 
    <div id="rssOutput"></div> 
</body> 
</html>

（相當肯定，有什麼不對這個文件，我認爲出現的問題在接下來的一個雖然我包括這一個完整性）

logger.php：

<?php 

//function to get all comics from an rss feed 
function getComics($xml) 
{ 
    $xmlDoc = new DOMDocument(); 
    $xmlDoc->load($xml); 

    $x=$xmlDoc->getElementsByTagName('item'); 
    foreach ($x as $x) 
    { 
     $comic_image=$x->getElementsByTagName('description')->item(0)->childNodes->item(0)->nodeValue; 
     //output the comic 
     echo ($comic_image . "</p>"); 
     echo ("<br>"); 
    } 

} 

//create array of all RSS feed URLs 
$URLs = 
[ 
    "SMBC" => "http://www.smbc-comics.com/rss.php", 
    "garfieldMinusGarfield" => "http://garfieldminusgarfield.net/rss", 
    "babyBlues" => "http://www.comicsyndicate.org/Feed/Baby%20Blues", 
]; 

//Loop through all RSS feeds 
foreach ($URLs as $xml) 
{ 
    getComics($xml); 
} 

?>

由於這種方法包括在漫畫圖像之間的額外的文本（人SMBC中的隨機東西，只有幾個廣告鏈接gMg和嬰兒藍調的版權鏈接），我看了一下RSS源並得出結論，問題在於它是包含圖像源的描述標籤，但也包括其他的東西。接下來，我嘗試修改getComics函數直接掃描圖像標記，而不是先查找描述標記。我更換了部分DOM文檔創建/加載，並與URL列表之間：

$images=$xmlDoc->getElementsByTagName('img'); 
    print_r($images); 

    foreach ($images as $image) 
    { 
     //echo $image->item(0)->getAttribute('src'); 
     echo $image->item(0)->nodeValue; 
     echo ("<br>"); 
    }

但顯然的getElementsByTagName不拿起嵌入描述標籤內的圖像標籤，因爲我沒有得到任何的漫畫圖像輸出，從print_r的語句下面的輸出：

DOMNodeList Object ([length] => 0) DOMNodeList Object ([length] => 0)

最後，我試了兩種方法的結合，試圖用getElementsByTagNam（「IMG」），它分析出來的描述標籤的內容裏面的代碼。我更換了行：

$comic_image=$x->getElementsByTagName('description')->item(0)->childNodes->item(0)->nodeValue;

有：

$comic_image=$x->getElementsByTagName('description')->item(0)->getElementsByTagName('img'); 
     print_r($comic_image);

但這也覺得沒有什麼，產生的輸出：

DOMNodeList Object ([length] => 0)

所以對於很長的背景很抱歉，但我想知道是否有一種方法可以解析給定的RSS提要中的img src而沒有其他文本和鏈接，我不想要？

幫助，將不勝感激

來源

2016-07-30 user2472083

內部，描述的內容被轉義，所以下面的代碼應該工作：

foreach ($x as $y) { 
    $description = $y->getElementsByTagName('description')->item(0); 
    $decoded_description = htmlspecialchars_decode($description->nodeValue); 
    $description_xml = new DOMDocument(); 
    $description_xml->loadHTML($decoded_description); 
    $comic_image = $description_xml->getElementsByTagName('img')->item(0)->getAttribute('src'); 

    //output the comic 
    echo ($comic_image); 
    echo ("<br>"); 
}

來源

2016-07-30 22:59:39 CountZero

謝謝，我想我通常能獲得你說的話，我嘗試了您的特定代碼。它適用於某些訂閱源，但會爲其他訂閱者產生一個奇怪的錯誤。例如，對於SMBC，它會輸出5個有效的圖像URL，但會反覆給出以下錯誤：Warning：DOMDocument :: loadHTML（）：htmlParseEntityRef：expected';'在第30行的C：\ xampp \ htdocs \ comic_database_logger \ logger.php中的實體行中：30，這讓我很困惑。我不明白爲什麼在嬰兒藍調的某些描述文字 – user2472083

中預計會出現一個分號，它完全起作用（儘管它放出了圖像的URL而不是圖像本身，我想我可以在以後解決），而加菲爾德減去加菲貓，它給出的是上面列出的錯誤。非常困惑 – user2472083

實際上，我試着在導致問題的行前添加@，因爲它們只是警告，現在一切都很完美，除了我需要弄清楚如何顯示圖像而不是圖像源鏈接 – user2472083

對於任何後來其他人閱讀本論壇的參考，這裏是我的代碼結束了。我更換了裏面的一切只是一個getImageSrc功能每次循環調用一個函數getImageTag：

//function to find an image tag within a specific section if there is one 
function getImageTag ($item,$tagName) 
{ 
    //pull desired section from given item 
    $section = $item->getElementsByTagName($tagName)->item(0); 
    //reparse description as if it were a string, because for some reason PHP woon't let you directly go to the source image with getElementsByTagName 
    $decoded_section = htmlspecialchars_decode($section->nodeValue); 
    $section_xml = new DOMDocument(); 
    @$section_xml->loadHTML($decoded_section); //the @ is to suppress a bunch of warnings about characters this parser doesn't like 
    //pull image tag from section if there 
    $image_tag = $section_xml->getElementsByTagName('img')->item(0); 
    return $image_tag; 
} 

//function to get the image source URL from a given item 
function getImageSrc ($item) 
{ 
    $image_tag = getImageTag($item,'description'); 
    if (is_null($image_tag)) //if there was nothing with the tag name of image in the description section 
    { 
     //check in content:encoded section, because that's the next most likely place 
     $image_tag = getImageTag($item,'encoded'); 
     if (is_null($image_tag)) //if there was nothing with the tag name of image in the encoded content section 
     { 
      //if the program gets here, it's probably because the feed is crap and doesn't include images, 
      //or it's because this particular item doesn't have a comic image in it 
      $image_src = ''; 
      //THIS EXCEPTION WILL PROBABLY NEED TO BE HANDLED LATER TO AVOID POTENTIAL ERRORS 
     } else 
     { 
      $image_src = $image_tag->getAttribute('src'); 
     } 
    } else 
    { 
     $image_src = $image_tag->getAttribute('src'); 
    } 
    return $image_src; 
}

來源

2016-08-04 17:34:54 user2472083

從RSS提要解析只是IMG SRC麻煩？

回答

相關問題