2014-11-06 47 views
0

我試圖讓「名稱」,並從下面的HTML文件「EMAIL」文本:如何使用的XDocument和擴展方法來獲得從XML文檔內文

<!DOCTYPE html> 

<html lang="en" xmlns="http://www.w3.org/1999/xhtml"> 
<head> 
    <meta charset="utf-8" /> 
    <title></title> 
</head> 
<body> 
    <ol> 
     <li> 
      <font class="normal"> 
       <b>NAME</b> <a href="/member/mail_compose.aspx?id=name"><img src="/images/mailbox.gif" border="0" alt="Send Mail" /></a> <a href="/photos/member_viewphoto.aspx?id=name"><img src="/images/icons/member_photos.gif" border="0" alt="View Photos" /></a> <br /> 
       ADDRESS<br /> 
       PHONE<br /> 
       <a href="mailto:[email protected]" class="redlink">EMAIL</a><br /> 
       <br /> 
      </font> 
     </li> 
</body> 
</html> 

這裏是代碼我使用:

// Load the xml document 
XDocument xDoc = XDocument.Load(@"..\..\Directory.html"); 

// Parse document 
var names = xDoc.Root.DescendantsAndSelf() 
     .Where(x => x.Name.LocalName == "ol").DescendantsAndSelf() 
     .Where(x => x.Name.LocalName == "li").DescendantsAndSelf() 
     .Select(x => new 
         { 
          name = x.Elements().Where(y => y.Name.LocalName == "b").Select(y => y.Value), 
          email = x.DescendantsAndSelf().Where(y => y.Name.LocalName == "a" && x.FirstAttribute.Name == "href" && x.Attribute("href").Value.Contains("mailto")).Select(y => y.Value ?? "No Email") 
         } 
     ); 

// Print text to console 
for (int i = 0; i < names.Count(); i++) 
{ 
    Console.WriteLine("{0}: {1}", names.ElementAt(i).name, names.ElementAt(i).email); 
} 

不知何故,上面的代碼是印刷本:

System.Linq.Enumerable + WhereSelectEnumerableIterator 2[System.Xml.Linq.XElement, System.String]: System.Linq.Enumerable+WhereSelectEnumerableIterator 2 System.Xm l.Linq.XElement,System.String]

可能有人請告訴我爲什麼發生這種情況?另外,如果有更好的方法,建議將會非常受歡迎。

回答

0

要回答你的第一個問題(這對你來說可能比代碼更重要,我必須讓它適用於這個示例HTML),你可以選擇你的姓名和電子郵件字段。這就是爲什麼當你循環名稱時你要返回一個集合。如果這實際上是你想要的,那麼當你創建你的匿名對象時,做一個SelectMany而不是一個Select。

沒有模式,我不知道怎麼了「選擇」

的另一個問題是,對於href屬性,則需要比較FirstAttribute.Name.LocalName,而不是以前好你的XML穿越只是FirstAttribute.Name

var names = xDoc.Root.DescendantsAndSelf() 
       .Where(x => x.Name.LocalName == "ol").DescendantsAndSelf() 
       .Where(x => x.Name.LocalName == "li").DescendantsAndSelf() 
       .Where(x => x.Name.LocalName == "font") 
       .Select(x => new 
       { 
        name = x.Descendants().Where(y => y.Name.LocalName == "b").Select(y => y.Value).Single(), 
        email = x.Descendants().Where(y => y.Name.LocalName == "a" && y.FirstAttribute.Name.LocalName == "href" && y.Attribute("href").Value.Contains("mailto")).Select(y => y.Value).Single() 
       }); 

一些注意事項:

y.Value ?? "No Email" 

需要重做,因爲y.Value永遠不會爲空
你也失蹤在你的HTML :)

1

爲空的OL標籤不檢查(注意:最讓我用FirstorDefault有可能thrrow NullExceptions因爲我不檢查在解決空的地方。

var htmlToProcess = 
@"<!DOCTYPE html> 
           <html lang='en' xmlns='http://www.w3.org/1999/xhtml'> 
           <head> 
            <meta charset='utf-8' /> 
            <title></title> 
           </head> 
           <body> 
            <ol> 
             <li> 
              <font class='normal'> 
               <b>NAME</b> <a href='/member/mail_compose.aspx?id=name'><img src='/images/mailbox.gif' border='0' alt='Send Mail' /></a> <a href='/photos/member_viewphoto.aspx?id=name'><img src='/images/icons/member_photos.gif' border='0' alt='View Photos' /></a> <br /> 
               ADDRESS<br /> 
               PHONE<br /> 
               <a href='mailto:[email protected]' class='redlink'>EMAIL</a><br /> 
               <br /> 
              </font> 
             </li> 
            </ol> 
           </body> 
           </html>"; 
     var body = dataSet1Tree.Nodes() 
           .OfType<XElement>() 
           .FirstOrDefault(x=> x.Name.LocalName.ToLower() =="body"); 

     if (body != null) 
     { 
      var oi = body.Descendants() 
         .FirstOrDefault(x => x.Name.LocalName.ToLower() == "ol"); 
      if (oi != null) 
      { 
       var lis = oi.Elements() 
          .Where(x=> x.Name.LocalName.ToLower()=="li"); 
       var listContainingInfo =from font in lis.Select(li => body.Descendants() 
                      .FirstOrDefault(x => x.Name.LocalName.ToLower() == "font")) 
                 .Where(font => font != null) 
             select font.Nodes().OfType<XElement>(); 

       var listOfUsers = listContainingInfo.Select(nodes => new 
       { 
        Name = nodes.FirstOrDefault(innerNode => innerNode.Name.LocalName.ToLower() == "b").Value, 
        Email = nodes.FirstOrDefault(innerNode => innerNode.Value == "EMAIL") 
           .Attributes("href") 
           .FirstOrDefault() 
           .Value 
       }); 

       foreach (var user in listOfUsers) 
        Console.WriteLine(user.Name +" "+ user.Email); 


      } 
     } 
+0

此答案也適用,但我將其他正確答案標記爲答案,因爲它是首先發布的。謝謝您的回答。 – Tom 2014-11-10 16:33:28