c#
  • xpath
  • html-agility-pack
  • 2014-08-31 72 views 0 likes 
    0

    我需要擺脫<!-- custom ads --><!-- /custom ads --> 之間的部分代碼片段。HtmlAgilityPack - 擺脫HTML評論標籤之間的廣告

    <!-- custom ads --> 
    <div style="float:left"> 
        <!-- custom_Forum_Postbit_336x280 --> 
        <div id='div-gpt-ad-1526374586789-2' style='width:336px; height:280px;'> 
        <script type='text/javascript'> 
         googletag.display('div-gpt-ad-1526374586789-2'); 
        </script> 
        </div> 
    </div> 
    <div style="float:left; padding-left:20px"> 
        <!-- custom_Forum_Postbit_336x280_r --> 
        <div id='div-gpt-ad-1526374586789-3' style='width:336px; height:280px;'> 
        <script type='text/javascript'> 
         googletag.display('div-gpt-ad-1526374586789-3'); 
        </script> 
        </div> 
    </div> 
    <div class="clear"></div> 
    
    <br> 
    <!-- /custom ads --> 
    
    
    <!-- google_ad_section_start -->Some Text,<br> 
    Some More Text...<br> 
    <!-- google_ad_section_end --> 
    

    我已經可以找到這個XPath //comment()[contains(., 'custom')]兩個意見,但現在我堅持瞭如何刪除一切,這是那些「標籤」之間。

     foreach (var comment in htmlDoc.DocumentNode.SelectNodes("//comment()[contains(., 'custom')]")) 
         { 
          MessageBox.Show(comment.OuterHtml); 
         } 
    

    有什麼建議嗎?

    +0

    獲取2個評論標記的父節點中的所有節點,比遍歷所有子節點並刪除從第一個到第二個評論的節點。 – 2014-08-31 21:13:27

    +0

    'var newhtml = Regex.Replace(html,Regex.Escape(start)+「。+?」+ Regex.Escape(end),「」,RegexOptions.Singleline);' – 2014-08-31 21:29:07

    回答

    3
    //find all comment nodes that contain "custom ads" 
    var nodes = doc.DocumentNode 
           .Descendants() 
           .OfType<HtmlCommentNode>() 
           .Where(c => c.Comment.Contains("custom ads")) 
           .ToList(); 
    //create a sequence of pairs of nodes 
    var nodePairs = nodes 
        .Select((node, index) => new {node, index}) 
        .GroupBy(x => x.index/2) 
        .Select(g => g.ToArray()) 
        .Select(a => new { startComment = a[0].node, endComment = a[1].node}); 
    
    foreach (var pair in nodePairs) 
    { 
        var startNode = pair.startComment; 
        var endNode = pair.endComment; 
        //check they share the same parent or the wheels will fall off 
        if(startNode.ParentNode != endNode.ParentNode) throw new Exception(); 
        //iterate all nodes inbetween 
        var currentNode = startNode.NextSibling; 
        while(currentNode != endNode) 
        { 
         //currentNode won't have siblings when we trim it from the doc 
         //so grab the nextSibling while it's still attached 
         var n = currentNode.NextSibling; 
         //and cut out currentNode 
         currentNode.Remove(); 
         currentNode = n; 
        } 
    } 
    
    +0

    Thanks,Looks great,'if(nodes .Count!= 2)拋出新的Exception()不會爲我工作,網頁上可能會有多個廣告。但總會有至少1. – MrMAG 2014-08-31 21:34:30

    +0

    非常感謝。我只是用for循環包圍了你的第一個代碼。但是這個非常穩固! – MrMAG 2014-08-31 21:50:57

    相關問題