2017-04-17 65 views
0

美好的一天。我有一個任務,我需要將word文檔轉換爲html。c#:HtmlAgilityPack後裔

這可以使用interop完成並將文檔保存爲html。但我需要清除互操作的html輸出

但我有一個htmlagilitypack的問題。我認爲它類似的XmlDocument C#

這是我的C#代碼

HtmlDocument doc = new HtmlDocument(); 
doc.Load(htmlLocation); 
     foreach (var item in doc.DocumentNode.Descendants("p")) 
     { 

     if (item.HasChildNodes) 
      { 
      foreach (var itm in item.Descendants("span").ToList()) 
       { 
        Console.WriteLine(itm.InnerText); 
       } 
      } 
     } 

這是HTML代碼

<html> 

<head> 
<meta http-equiv=Content-Type content="text/html; charset=windows-1252"> 
<meta name=Generator content="Microsoft Word 12 (filtered)"> 

</head> 

<body lang=EN-US link="#0066CC" vlink=purple style='text-justify-trim:punctuation'> 

<div class=WordSection1> 

<p class=Heading61 style='margin-bottom:0in;margin-bottom:.0001pt;text-indent: 
.5in;line-height:normal;page-break-after:avoid;background:transparent'><span 
class=Heading6><span style='font-size:12.0pt;color:black;background:yellow'>Epilogue</span></span></p> 

<p class=MsoBodyText style='line-height:normal;background:transparent'><span 
class=BodytextItalic2><span style='font-size:12.0pt;color:black;font-style: 
normal'>&nbsp;</span></span></p> 

<p class=MsoBodyText style='line-height:normal;background:transparent'><span 
class=BodytextItalic2><span style='font-size:12.0pt;color:black;font-style: 
normal'>Rebecca sat outside her lodge cradling her infant son in her arms. How 
handsome he was, her little warrior, with his dusky skin and thick black hair. 
For the first few days after his birth, she had been afraid to let him out of 
her sight, out of her arms, for fear she would lose him, but he was a strong 
healthy child.</span></span></p> 

<p class=MsoBodyText style='text-indent:.5in;line-height:normal;background: 
transparent'><span class=BodytextItalic2><span style='font-size:12.0pt; 
color:black;font-style:normal'>Looking at him made her heart swell with love 
for him and for his father. She had married Wolf Dreamer the day after they 
returned to his people. Summer Moon Rising had left the village the following 
day.</span></span></p> 

</div> 

</body> 

</html> 

這是代碼的輸出上面

Epilogue 
Epilogue 
&nbsp; 
&nbsp; 
Rebecca sat outside her lodge cradling her infant son in her arms. How 
handsome he was, her little warrior, with his dusky skin and thick black hair. 
For the first few days after his birth, she had been afraid to let him out of 
her sight, out of her arms, for fear she would lose him, but he was a strong 
healthy child. 
Rebecca sat outside her lodge cradling her infant son in her arms. How 
handsome he was, her little warrior, with his dusky skin and thick black hair. 
For the first few days after his birth, she had been afraid to let him out of 
her sight, out of her arms, for fear she would lose him, but he was a strong 
healthy child. 
Looking at him made her heart swell with love 
for him and for his father. She had married Wolf Dreamer the day after they 
returned to his people. Summer Moon Rising had left the village the following 
day. 
Looking at him made her heart swell with love 
for him and for his father. She had married Wolf Dreamer the day after they day. 

我期望的是每個的第二個取決於項目元素。但爲什麼它會重複文本?

回答

1

你有4個p標籤,每個標籤有兩個跨度。後代,得到所有的子節點有兩個跨度匹配的名稱,這樣你內心的foreach重複

你內心的foreach可以

foreach (var itm in item.ChildNodes) 
    { 
     Console.WriteLine(itm.InnerText); 
    }