2013-12-16 50 views
0

我有,我需要僅從這HTML獲取鏈接的要求C#正則表達式解析跨度

"<span class=""name""><a href=Details.aspx?entityID=1&hash=20&searchFunctionID=53b&type=Advanced&nameSet=Entities&q=a&textSearchType=ExactPhrase&orgTypes=01%2c02%2c03%2c04%2c05%2c06%2c07%2c08%2c09%2c10%2c11%2c12%2c13%2c14%2c15%2c16%2c90%2c96%2c98%2c99> GOOGLE CORPORATION </a> </span> <br /> <span class=typeDescription> 09 - Analytics Company </span>" 

我需要的輸出是

Details.aspx?entityID=1&hash=20&searchFunctionID=53b&type=Advanced&nameSet=Entities&q=a&textSearchType=ExactPhrase&orgTypes=01%2c02%2c03%2c04%2c05%2c06%2c07%2c08%2c09%2c10%2c11%2c12%2c13%2c14%2c15%2c16%2c90%2c96%2c98%2c99 

我用

string sPattern ="[<a href=](.*?(99))"; 
MatchCollection mcMatches = Regex.Matches(input,sPattern); 
foreach (Match m in mcMatches) 
{ 
    Console.WriteLine(m.Value); 
} 

這並沒有給我正確的輸出。任何人都可以指向正確的方向。

+0

用正則表達式解析HTML? [壞主意!](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454)相反,爲什麼不使用正確的HTML解析器,就像[Html Agility Pack](http://htmlagilitypack.codeplex.com/)? – Shaamaan

回答

0

正如Shaamaan說,正則表達式是不解析HTML,對於給定的例子,有較好的正則表達式是正確的方式,雖然不能保證它永遠是可行的:

(?:<a href=)([^">]*) 
6

正如上文所述,解析HTML與正則表達式是not very good idea。我建議你使用HtmlAgilityPack(你可以從的NuGet得到它):

HtmlDocument hdoc = new HtmlDocument();    
hdoc.LoadHtml(@"<span class=""name""><a href=Details.aspx?entityID=1&hash=20&searchFunctionID=53b&type=Advanced&nameSet=Entities&q=a&textSearchType=ExactPhrase&orgTypes=01%2c02%2c03%2c04%2c05%2c06%2c07%2c08%2c09%2c10%2c11%2c12%2c13%2c14%2c15%2c16%2c90%2c96%2c98%2c99> GOOGLE CORPORATION </a> </span> <br /> <span class=typeDescription> 09 - Analytics Company </span>"); 
var href = hdoc.DocumentNode.SelectSingleNode("//a").Attributes["href"].Value; 

它給你href屬性的值。