2009-12-28 228 views
0

給定一個URL,如果它有任何RSS節點,那麼我將添加到數據庫。如何提取原子/ RSS

例如爲:

對於this URLrssDoc.SelectNodes("rss/channel/item").Count是大於零。

但是對於the atom url,rssDoc.SelectNodes("rss/channel/item").count等於零。

如何檢查Atom/RSS網址是否有任何節點?我曾嘗試過rssDoc.SelectNodes("feed/entry").Count,但給我零計數。

Public Shared Function HasRssItems(ByVal url as string) As Boolean 
Dim myRequest As WebRequest 
Dim myResponse As WebResponse 
Try 
    myRequest = System.Net.WebRequest.Create(url) 
    myRequest.Timeout = 5000 
    myResponse = myRequest.GetResponse() 

    Dim rssStream As Stream = myResponse.GetResponseStream() 
    Dim rssDoc As New XmlDocument() 
    rssDoc.Load(rssStream) 

    Return rssDoc.SelectNodes("rss/channel/item").Count > 0 
Catch ex As Exception 
    Return False 
Finally 
    myResponse.Close() 
End Try 

端功能

回答

1

你這裏的主要問題是,XML 「節點路徑」 上的這一行:

Return rssDoc.SelectNodes("rss/channel/item").Count > 0

只適用於RSS feeds,不ATOM feeds

過去我已經知道的一種方法是使用一個簡單的函數將ATOM feed轉換爲RSS feed。當然,你可以換個方式,或者根本不轉換,然而,轉換爲單一格式可以讓你編寫一個「通用」代碼塊,它將提取你可能感興趣的飼料項目的各種元素(即日期,標題等)

代碼項目上有一個ATOM to RSS Converter article,它提供了這樣的轉換,但是,這是在C#中。我以前手動將其轉換爲VB.NET我自己,所以這裏是VB.NET版本:

Private Function AtomToRssConverter(ByVal atomDoc As XmlDocument) As XmlDocument 
    Dim xmlDoc As XmlDocument = atomDoc 
    Dim xmlNode As XmlNode = Nothing 
    Dim mgr As New XmlNamespaceManager(xmlDoc.NameTable) 
    mgr.AddNamespace("atom", "http://purl.org/atom/ns#") 
    Const rssVersion As String = "2.0" 
    Const rssLanguage As String = "en-US" 
    Dim rssGenerator As String = "RDFFeedConverter" 
    Dim memoryStream As New MemoryStream() 
    Dim xmlWriter As New XmlTextWriter(memoryStream, Nothing) 
    xmlWriter.Formatting = Formatting.Indented 
    Dim feedTitle As String = "" 
    Dim feedLink As String = "" 
    Dim rssDescription As String = "" 

    xmlNode = xmlDoc.SelectSingleNode("//atom:title", mgr) 
    If xmlNode Is Nothing Then 
      This looks like an ATOM v1.0 format, rather than ATOM v0.3. 
     mgr.RemoveNamespace("atom", "http://purl.org/atom/ns#") 
     mgr.AddNamespace("atom", "http://www.w3.org/2005/Atom") 
    End If 

    xmlNode = xmlDoc.SelectSingleNode("//atom:title", mgr) 
    If Not xmlNode Is Nothing Then 
     feedTitle = xmlNode.InnerText 
    End If 
    xmlNode = xmlDoc.SelectNodes("//atom:link/@href", mgr)(2) 
    If Not xmlNode Is Nothing Then 
     feedLink = xmlNode.InnerText 
    End If 
    xmlNode = xmlDoc.SelectSingleNode("//atom:tagline", mgr) 
    If Not xmlNode Is Nothing Then 
     rssDescription = xmlNode.InnerText 
    End If 
    xmlNode = xmlDoc.SelectSingleNode("//atom:subtitle", mgr) 
    If Not xmlNode Is Nothing Then 
     rssDescription = xmlNode.InnerText 
    End If 

    xmlWriter.WriteStartElement("rss") 
    xmlWriter.WriteAttributeString("version", rssVersion) 
    xmlWriter.WriteStartElement("channel") 
    xmlWriter.WriteElementString("title", feedTitle) 
    xmlWriter.WriteElementString("link", feedLink) 
    xmlWriter.WriteElementString("description", rssDescription) 
    xmlWriter.WriteElementString("language", rssLanguage) 
    xmlWriter.WriteElementString("generator", rssGenerator) 
    Dim items As XmlNodeList = xmlDoc.SelectNodes("//atom:entry", mgr) 
    If items Is Nothing Then 
     Throw New FormatException("Atom feed is not in expected format. ") 
    Else 
     Dim title As String = [String].Empty 
     Dim link As String = [String].Empty 
     Dim description As String = [String].Empty 
     Dim author As String = [String].Empty 
     Dim pubDate As String = [String].Empty 
     For i As Integer = 0 To items.Count - 1 
      Dim nodTitle As XmlNode = items(i) 
      xmlNode = nodTitle.SelectSingleNode("atom:title", mgr) 
      If Not xmlNode Is Nothing Then 
       title = xmlNode.InnerText 
      End If 
      Try 
       link = items(i).SelectSingleNode("atom:link[@rel= alternate ]", mgr).Attributes("href").InnerText 
      Catch ex As Exception 
       link = items(i).SelectSingleNode("atom:link", mgr).Attributes("href").InnerText 
      End Try 
      xmlNode = items(i).SelectSingleNode("atom:content", mgr) 
      If Not xmlNode Is Nothing Then 
       description = xmlNode.InnerText 
      End If 
      xmlNode = items(i).SelectSingleNode("//atom:name", mgr) 
      If Not xmlNode Is Nothing Then 
       author = xmlNode.InnerText 
      End If 
      xmlNode = items(i).SelectSingleNode("atom:issued", mgr) 
      If Not xmlNode Is Nothing Then 
       pubDate = xmlNode.InnerText 
      End If 
      xmlNode = items(i).SelectSingleNode("atom:updated", mgr) 
      If Not xmlNode Is Nothing Then 
       pubDate = xmlNode.InnerText 
      End If 
      xmlWriter.WriteStartElement("item") 
      xmlWriter.WriteElementString("title", title) 
      xmlWriter.WriteElementString("link", link) 
      If pubDate.Length < 1 Then 
       pubDate = Date.MinValue.ToString() 
      End If 
      xmlWriter.WriteElementString("pubDate", Convert.ToDateTime(pubDate).ToUniversalTime().ToString("ddd, dd MMM yyyy HH:mm:ss G\MT")) 
      xmlWriter.WriteElementString("author", author) 
      xmlWriter.WriteElementString("description", description) 
      xmlWriter.WriteEndElement() 
     Next 
     xmlWriter.WriteEndElement() 
     xmlWriter.Flush() 
     xmlWriter.Close() 
    End If 
    Dim retDoc As New XmlDocument() 
    Dim outStr As String = Encoding.UTF8.GetString(memoryStream.ToArray()) 
    retDoc.LoadXml(outStr) 
    Return retDoc 
End Function 

用法是相當直接的。只需將你的ATOM feed加載到一個XmlDocument對象中,並將它傳遞給這個函數,就會以RSS格式返回一個XmlDocument對象!

如果你有興趣,我把整個RSSReader class up on pastebin.com