2013-05-29 96 views
1

我想解析兩行HTML中的文本。正則表達式 n不起作用

Dim PattStats As New Regex("class=""head"">(.+?)</td>"+ 
          "\n<td>(.+?)</td>") 
Dim makor As MatchCollection = PattStats.Matches(page) 

For Each MatchMak As Match In makor 
    ListView3.Items.Add(MatchMak.Groups(1).Value) 
Next 

我添加了\n以匹配下一行,但由於某種原因,它不起作用。這是我運行正則表達式的源代碼。

<table class="table table-striped table-bordered table-condensed"> 
    <tbody> 
    <tr> 
     <td class="head">Health Points:</td> 
     <td>445 (+85/per level)</td> 
     <td class="head">Health Regen:</td> 
     <td>7.25</td> 
    </tr> 
    <tr> 
     <td class="head">Energy:</td> 
     <td>200</td> 
     <td class="head">Energy Regen:</td> 
     <td>50</td> 
    </tr> 
    <tr> 
     <td class="head">Damage:</td> 
     <td>53 (+3.2/per level)</td> 
     <td class="head">Attack Speed:</td> 
     <td>0.694 (+3.1/per level)</td> 
    </tr>   
    <tr> 
     <td class="head">Attack Range:</td> 
     <td>125</td> 
     <td class="head">Movement Speed:</td> 
     <td>325</td> 
    </tr> 
    <tr> 
     <td class="head">Armor:</td> 
     <td>16.5 (+3.5/per level)</td> 
     <td class="head">Magic Resistance:</td> 
     <td>30 (+1.25/per level)</td> 
    </tr>  
    <tr> 
     <td class="head">Influence Points (IP):</td> 
     <td>3150</td> 
     <td class="head">Riot Points (RP):</td> 
     <td>975</td> 
    </tr> 
    </tbody> 
</table> 

我想匹配第一個<td class...>和一個正則表達式如下一行:/

+0

嘗試使用'\ r \ n'而不是'\ n' –

+1

您可以真正使用xpath來做到這一點。 –

+0

丹尼爾:試過它,但它沒有工作:( 卡西米爾:從未使用xpath,所以我不知道它是什麼:/ –

回答

1

說明

此正則表達式會發現td標籤和兩組歸還。

<td\b[^>]*>([^<]*)<\/td>[^<]*<td\b[^>]*>([^<]*)<\/td>

enter image description here

摘要

  • <td\b[^>]*>找到的第一個td標籤和使用任何屬性
  • ([^<]*)捕獲第一內文,這可以是貪婪的,但我們假設電池沒有嵌套標籤
  • <\/td>找到結束標記
  • [^<]*移過文本,直到你的所有的休息,這是假定有第一和第二td標籤之間沒有額外的標籤
  • <td\b[^>]*>找到第二個TD踏歌而消耗任何屬性
  • ([^<]*)拍攝第二張內部文本,這可能是貪婪的,但我們認爲該單元沒有嵌套標籤
  • <\/td>找到結束標記

組0將獲得整個字符串

  1. 將具有第一TD組
  2. 將具有第二TD組

VB.NET代碼示例:

Imports System.Text.RegularExpressions 
Module Module1 
    Sub Main() 
    Dim sourcestring as String = "replace with your source string" 
    Dim re As Regex = New Regex("<td\b[^>]*>([^<]*)<\/td>[^<]*<td\b[^>]*>([^<]*)<\/td>",RegexOptions.IgnoreCase OR RegexOptions.Singleline) 
    Dim mc as MatchCollection = re.Matches(sourcestring) 
    Dim mIdx as Integer = 0 
    For each m as Match in mc 
     For groupIdx As Integer = 0 To m.Groups.Count - 1 
     Console.WriteLine("[{0}][{1}] = {2}", mIdx, re.GetGroupNames(groupIdx), m.Groups(groupIdx).Value) 
     Next 
     mIdx=mIdx+1 
    Next 
    End Sub 
End Module 

$matches Array: 
(
    [0] => Array 
     (
      [0] => <td class="head">Health Points:</td> 
      <td>445 (+85/per level)</td> 
      [1] => <td class="head">Health Regen:</td> 
      <td>7.25</td> 
      [2] => <td class="head">Energy:</td> 
      <td>200</td> 
      [3] => <td class="head">Energy Regen:</td> 
      <td>50</td> 
      [4] => <td class="head">Damage:</td> 
      <td>53 (+3.2/per level)</td> 
      [5] => <td class="head">Attack Speed:</td> 
      <td>0.694 (+3.1/per level)</td> 
      [6] => <td class="head">Attack Range:</td> 
      <td>125</td> 
      [7] => <td class="head">Movement Speed:</td> 
      <td>325</td> 
      [8] => <td class="head">Armor:</td> 
      <td>16.5 (+3.5/per level)</td> 
      [9] => <td class="head">Magic Resistance:</td> 
      <td>30 (+1.25/per level)</td> 
      [10] => <td class="head">Influence Points (IP):</td> 
      <td>3150</td> 
      [11] => <td class="head">Riot Points (RP):</td> 
      <td>975</td> 
     ) 

    [1] => Array 
     (
      [0] => Health Points: 
      [1] => Health Regen: 
      [2] => Energy: 
      [3] => Energy Regen: 
      [4] => Damage: 
      [5] => Attack Speed: 
      [6] => Attack Range: 
      [7] => Movement Speed: 
      [8] => Armor: 
      [9] => Magic Resistance: 
      [10] => Influence Points (IP): 
      [11] => Riot Points (RP): 
     ) 

    [2] => Array 
     (
      [0] => 445 (+85/per level) 
      [1] => 7.25 
      [2] => 200 
      [3] => 50 
      [4] => 53 (+3.2/per level) 
      [5] => 0.694 (+3.1/per level) 
      [6] => 125 
      [7] => 325 
      [8] => 16.5 (+3.5/per level) 
      [9] => 30 (+1.25/per level) 
      [10] => 3150 
      [11] => 975 
     ) 

) 

免責聲明

用正則表達式解析html是真的不是最好的解決方案,因爲有很多邊緣案例我們無法預測。但是,在這種情況下,如果輸入字符串總是這樣基本的,並且您願意接受正則表達式不能100%運行的風險,那麼這個解決方案可能適用於您。