2011-04-21 75 views
0

我正在C#中製作一個簡單的RSS閱讀器應用程序。這是我第一次使用XML,並且需要一些幫助來解析各種rss提要中使用的不同樣式。解析XML時出現問題?

以下是飼料的類型,我期待並獲得正確的結果有:

<item> 
     <title>Cometh the hour, cometh the man</title> 
     <link>http://www.espnstar.com/rss-feed/detail/item612327</link> 
     <description>Real Madrid have finally won their first trophy since 2008. Unsurprisingly, it has coincided with the arrival of one man.</description> 
     <pubDate>Thu, 21 Apr 2011 04:11:42 GMT</pubDate> 
    </item> 

但飼料,如:

<item><title>NBA: San Antonio Spurs rally to level up with Memphis Grizzlies</title> 
<link>http://timesofindia.feedsportal.com/c/33039/f/533921/s/1455e7bf/l/0Ltimesofindia0Bindiatimes0N0Csports0Cnba0Ctop0Estories0CNBA0ESan0EAntonio0ESpurs0Erally0Eto0Elevel0Eup0Ewith0EMemphis0EGrizzlies0Carticleshow0C80ABcms/story01.htm</link> 
<description>The San Antonio Spurs, trailed by three points at half-time, rallied to level their first round playoff series with the Memphis Grizzlies at 1-1 with a 93-87 victory.&lt;img width='1' height='1' src='http://timesofindia.feedsportal.com/c/33039/f/533921/s/1455e7bf/mf.gif' border='0'/&gt;&lt;div class='mf-viral'&gt;&lt;table border='0'&gt;&lt;tr&gt;&lt;td valign='middle'&gt;&lt;a href="http://res.feedsportal.com/viral/sendemail2.html?title=NBA%3A+San+Antonio+Spurs+rally+to+level+up+with+Memphis+Grizzlies&amp;link=http%3A%2F%2Ftimesofindia.indiatimes.com%2Fsports%2Fnba%2Ftop-stories%2FNBA-San-Antonio-Spurs-rally-to-level-up-with-Memphis-Grizzlies%2Farticleshow%2F8044321.cms" target="_blank"&gt;&lt;img src="http://res3.feedsportal.com/images/emailthis2.gif" border="0" /&gt;&lt;/a&gt;&lt;/td&gt;&lt;td valign='middle'&gt;&lt;a href="http://res.feedsportal.com/viral/bookmark.cfm?title=NBA%3A+San+Antonio+Spurs+rally+to+level+up+with+Memphis+Grizzlies&amp;link=http%3A%2F%2Ftimesofindia.indiatimes.com%2Fsports%2Fnba%2Ftop-stories%2FNBA-San-Antonio-Spurs-rally-to-level-up-with-Memphis-Grizzlies%2Farticleshow%2F8044321.cms" target="_blank"&gt;&lt;img src="http://res3.feedsportal.com/images/bookmark.gif" border="0" /&gt;&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;&lt;/div&gt;&lt;br/&gt;&lt;br/&gt;&lt;a href="http://da.feedsportal.com/r/100752217265/u/242/f/533921/c/33039/s/1455e7bf/a2.htm"&gt;&lt;img src="http://da.feedsportal.com/r/100752217265/u/242/f/533921/c/33039/s/1455e7bf/a2.img" border="0"/&gt;&lt;/a&gt;</description> 
<pubDate>Thu, 21 Apr 2011 04:33:44 GMT</pubDate> 
</item> 

我如何提取主要描述文本從描述節點而不是像hrefs的其他東西。

如何像飼料處理CDATA:

<item> 
<title><![CDATA[Japan declares no-go zone around nuclear plant ]]></title> 

<author><![CDATA[AP]]></author> 
<category><![CDATA[International]]></category> 
<link>http://www.thehindu.com/news/international/article1714401.ece</link> 
<description><![CDATA[ 
Japan declared a 20 km evacuation zone around its tsunami-crippled nuclear power plant a no-go zone on Thursday, urging residents to abide by the order for the sake of their own safety. Chi... 
]]> 
</description> 
<pubDate><![CDATA[Thu, 21 Apr 2011 08:15:48 +0530]]></pubDate> 
</item> 
<item> 

回答

2

嗯,你可以使用正則表達式來擺脫一切你並不需要。

XElement d = XElement.Parse(feed); // load the feed in an XElement 

// get the title, link and description 
string title = d.Elements("title").FirstOrDefault().Value; 
string link = d.Elements("link").FirstOrDefault().Value; 
string description = d.Elements("description").FirstOrDefault().Value; 

// remove everything that's between '<>' 
Regex r = new Regex(@"<.*>"); 
description = r.Replace(description, ""); 

對於第二個飼料,結果將是:聖安東尼奧馬刺隊,在半場結束時以三分落後,反彈在1-1與地級與孟菲斯灰熊隊的第一輪季後賽系列賽93-87勝利。第一個Feed會保持不變,第三個Feed會自動忽略所有CDATA的內容。

+0

不太確定,但我沒有弄錯,如果編碼的HTML是收到的CDATA描述的一部分,上述正則表達式將不起作用。您可能需要首先使用描述的「HtmlDecode」,然後使用正則表達式。 – Ahmad 2011-04-21 06:24:08