2011-12-13 104 views
9

我將如何有效地解析從這個href屬性值:HTML敏捷包:解析href標記

<tr> 
<td rowspan="1" colspan="1">7</td> 
<td rowspan="1" colspan="1"> 
<a class="undMe" href="/ice/player.htm?id=8475179" rel="skaterLinkData" shape="rect">D. Kulikov</a> 
</td> 
<td rowspan="1" colspan="1">D</td> 
<td rowspan="1" colspan="1">0</td> 
<td rowspan="1" colspan="1">0</td> 
<td rowspan="1" colspan="1">0</td> 
[...] 

我很感興趣,讓玩家ID,它是:這裏是我的代碼迄今:根據你的榜樣

 // Iterate all rows (players) 
     for (int i = 1; i < rows.Count; ++i) 
     { 
      HtmlNodeCollection cols = rows[i].SelectNodes(".//td"); 

      // new player 
      Dim_Player player = new Dim_Player(); 

       // Iterate all columns in this row 
       for (int j = 1; j < 6; ++j) 
       { 
        switch (j) { 
         case 1: player.Name = cols[j].InnerText; 
           player.Player_id = Int32.Parse(/* this is where I want to parse the href value */); 
           break; 
         case 2: player.Position = cols[j].InnerText; break; 
         case 3: stats.Goals = Int32.Parse(cols[j].InnerText); break; 
         case 4: stats.Assists = Int32.Parse(cols[j].InnerText); break; 
         case 5: stats.Points = Int32.Parse(cols[j].InnerText); break; 
        } 
       } 
+0

如果你已經很難在'之開關編碼索引,爲什麼你會使用`for`循環?爲什麼不``player.Position = cols [2] .InnerText;` – 2011-12-13 23:34:43

+0

好點。我正在回收一些我寫的舊代碼,所以我沒有想到這一點。 – 2011-12-13 23:41:27

回答

20

這個工作對我來說:

HtmlDocument htmlDoc = new HtmlDocument(); 
htmlDoc.Load("test.html"); 
var link = htmlDoc.DocumentNode 
        .Descendants("a") 
        .First(x => x.Attributes["class"] != null 
          && x.Attributes["class"].Value == "undMe"); 

string hrefValue = link.Attributes["href"].Value; 
long playerId = Convert.ToInt64(hrefValue.Split('=')[1]); 

對於真正使用你需要添加錯誤檢查等

2

使用XPath表達式找到它:

foreach (HtmlNode link in doc.DocumentNode.SelectNodes("//a[@class='undMe']")) 
{ 
     HtmlAttribute att = link.Attributes["href"]; 
     Console.WriteLine(new Regex(@"(?<=[\?&]id=)\d+(?=\&|\#|$)").Match(att.Value).Value); 
}