2017-04-05 77 views
0

我在我的HTML中有一個嵌套的UL,LI列表。我怎樣才能從ul到ul節點的末尾獲得正則表達式。在這個例子中,我需要獲得2場比賽。正則表達式在C中嵌套從UL UL#

第一個應該是

<ul> 
    <li>This is First List</li> 
    <li>This is Second List</li> 
    <ul> 
     <li>This is Second UL First List </li> 
     <li>This is Second UL Second List </li> 
    </ul> 
    <li>This is Third List</li> 
</ul> 

,第二個應該是

<ul> 
     <li>This is Next List</li> 
     <ul> 
      <li>This is Test </li> 
     </ul> 
     <li>This is Third List</li> 
     <ul> 
      <li>This is Test </li> 
     </ul> 
</ul> 

我的HTML代碼:

<html> 
<p> This is First Paragraph </p> 
<ul> 
    <li>This is First List</li> 
    <li>This is Second List</li> 
    <ul> 
     <li>This is Second UL First List </li> 
     <li>This is Second UL Second List </li> 
    </ul> 
    <li>This is Third List</li> 
</ul> 
<p> This is Second Paragraph </p> 

<ul> 
    <li>This is Next List</li> 
    <ul> 
     <li>This is Test </li> 
    </ul> 
    <li>This is Third List</li> 
    <ul> 
     <li>This is Test </li> 
    </ul> 
</ul> 
</html> 
+1

不要使用正則表達式來解析HTML。請參閱:http://stackoverflow.com/a/1732454/4664094 –

+1

[必備鏈接](http://stackoverflow.com/a/1732454/2307070) –

+1

您可以嘗試HTML Agility Pack(https://htmlagilitypack.codeplex .COM)。正如以前的海報所指出的,不使用RegEx。 –

回答

0

可以匹配嵌套結構與.NET Balancing Groups。該特徵基本上增加了一個堆棧,它可以被推的概念/彈出(<NestedUL>...)(<-NestedUL>...),然後經由最後一個條件,其包括只保證失敗的圖案(?(NestedUL)(?!))空先行在圖案的端部爲進行測試:

var input = 
    @"<html> 
    <p> This is First Paragraph </p> 
    <ul> 
     <li>This is First List</li> 
     <li>This is Second List</li> 
     <ul> 
      <li>nested list #1 inside first parent UL</li> 
      <li>This is Second UL Second List </li> 
     </ul> 
     <li>This is Third List</li> 
    </ul> 
    <p> This is Second Paragraph </p> 

    <ul> 
     <li>This is Next List</li> 
     <ul> 
      <li>nested list #1 inside second parent UL</li> 
     </ul> 
     <li>This is Third List</li> 
     <ul> 
      <li>nested list #2 inside second parent UL</li> 
     </ul> 
    </ul> 
    </html>"; 
       var pattern = "<ul>(?:(?<NestedUL><ul>)|(?<-NestedUL></ul>)|.)+?(?(NestedUL)(?!))</ul>"; 
       var matches = Regex.Matches(input, pattern, RegexOptions.Singleline); 
      } 

*注意,反覆交替+?非貪婪量詞 - 如果這是貪婪的,這個模式將愉快地消耗都UL與一根火柴。