使用XPATH獲取HTML標記屬性與HTML敏捷包

 
META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1" /> 
TITLE>Microsoft Corporation 
META http-equiv="PICS-Label" content="(PICS-1.1 "http://www.rsac.org/ratingsv01.html" l gen true r (n 0 s 0 v 0 l 0))" /> 
META NAME="KEYWORDS" CONTENT="products; headlines; downloads; news; Web site; what's new; solutions; services; software; contests; corporate news;" /> 
META NAME="DESCRIPTION" CONTENT="The entry page to Microsoft's Web site. Find software, solutions, answers, support, and Microsoft news." /> 
META NAME="MS.LOCALE" CONTENT="EN-US" /> 
META NAME="CATEGORY" CONTENT="home page" />

我想知道什麼XPATH我需要使用HTML敏捷包獲取Category元標記的Content屬性的值。（我刪除了html代碼中每行的第一個<，所以它會發布）。使用XPATH獲取HTML標記屬性與HTML敏捷包

來源

2010-07-12 Eugene

很長一段時間HtmlAgilityPack didn't had the ability to directly query an attribute value。您必須遍歷元節點列表。這裏有一種方法 -

var doc = new HtmlDocument(); 
doc.LoadHtml(htmlString); 

var list = doc.DocumentNode.SelectNodes("//meta"); 
foreach (var node in list) 
{ 
    string content = node.GetAttributeValue("content", ""); 
}

但看起來像有一個experimental xpath release，可以讓你做到這一點。

doc.Document.SelectNodes("//meta/@content")

將返回一個HtmlAttribute對象的列表。

來源

2010-07-12 21:41:34

感謝您的快速反應Rohit Agarwal（我看到它在我問了幾個小時後纔回答，但直到今天才能測試）。

我本來實現你的建議如下（這是在vb.net）

Dim result As String = webClient.DownloadString(url) Dim doc As New HtmlDocument() doc.LoadHtml(result)

 Dim list = doc.DocumentNode.SelectNodes("//meta") 
    Dim node As Object 

    For Each node In list 
     Dim metaname As String = node.GetAttributeValue("name", String.Empty) 
     If metaname <> String.Empty Then 
      If (metaname = "title") Then 
       title = node.GetAttributeValue("content", String.Empty) 
      //more elseif thens 
      End if 
     End if 
    Next (node)

然而，我發現，//元[@名稱=「標題」]給我的同樣的結果

Dim result As String = webClient.DownloadString(url)

 Dim doc As New HtmlDocument() doc.LoadHtml(result)

title = doc.DocumentNode.SelectNodes("//meta[@name='title']")(0).GetAttributeValue("content", String.Empty)

謝謝你把我在正確的軌道上= d

來源

2010-07-14 20:54:38 Eugene

其實，稍微好一點的辦法是使用 title = doc.DocumentNode.SelectSingleNode("//meta[@name='title']").GetAttributeValue("content", String.Empty) – Eugene 2010-07-14 21:09:23

或者更好的是標題= doc.DocumentNode.SelectSingleNode（「//元[@名稱= '標題']/@內容」） – Eugene 2010-07-14 21:15:30

的上面一個title = doc.DocumentNode.SelectSingleNode（「// meta [@ name ='title']/@ content」）。ToString不起作用... – Eugene 2010-07-14 21:21:38

如果你只想meta標記顯示標題，描述和關鍵字，然後使用

if (metaTags != null) 
     { 
      foreach (var tag in metaTags) 
      { 
       if ((tag.Attributes["name"] != null) & (tag.Attributes["content"] != null)) 
       { 
         Panel divPage = new Panel();       
         divPage.InnerHtml = divPage.InnerHtml + "<br /> " + 
         "<b> Page " + tag.Attributes["name"].Value + " </b>: " + 
          tag.Attributes["content"].Value + "<br />"; 
       } 
      } 
     }

如果你想從該鏈接og:tags後

  if ((tag.Attributes["property"] != null) & (tag.Attributes["content"] != null)) 
      { 
       if (tag.Attributes["property"].Value == "og:image") 
       { 
        img.ImageUrl = tag.Attributes["content"].Value; 
       } 

      }

，這是很好的經驗。我喜歡添加以下代碼：）這個代碼永遠

來源

2015-07-23 08:49:16

由於沒有錯誤檢查：

doc.DocumentNode.SelectSingleNode("//meta[@name='description']").Attributes["content"].Value;

的C如果節點是空的，或者如果內容屬性不存在，則會產生問題。

來源

2017-10-24 06:47:39

使用XPATH獲取HTML標記屬性與HTML敏捷包

回答

相關問題