2012-07-20 43 views
0

嗨我很新的正則表達式,我需要一些幫助寫這個或至少讓我開始。找到與ID爲div

我想獲得頁面上所有的div,並把它們放到一個字符串集合

有可能是<和DIV和</DIV>感謝之間的空間

我之間的空間已經嘗試了htmlaggilitypack但多數民衆贊成遇到的問題,爲什麼我會這樣

Dim reg As Regex = New Regex("<div(.*?)> </div") 

Dim matches As string() = reg.Matches(htmlCode) 




<div id="out"> 

    <div id="one"> 
     < div id="b"></div> 
     < div id="d"></div> 
    </div> 

    <div  id="two"> 
     <h1>fsdfsdf</h1> 
     < div id="a"><div id="a"></div></div> 
    </div > 

</div> 
+7

不要使用正則表達式解析(X/HT/XHT)ML。你有什麼問題與htmlagilitypack? – Tharwen 2012-07-20 09:00:24

+1

[Obligatory link](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) – dreamlax 2012-07-20 09:02:37

+0

我無法獲得ID您是否有任何其他解決方案除了HTML敏捷包 – 2012-07-20 09:05:52

回答

2
它匹配所有

<div id='d'> 
    dsfdsfs 

    dsfdfd 

</div> 
<div>dave </div> 
<div>home </ div> 
<p></p> 

然而,

如果您想通過ID值返回divs的集合,那麼您可以在HMTL敏捷包中使用以下內容:

protected void Page_Load(object sender, EventArgs e) 
{ 
    List<HtmlAgilityPack.HtmlNode> divs = GetDivsInner(); 

    foreach (var node in divs) 
    { 
      Response.Write("Result: " + node.InnerHtml.ToString()); 
    } 

} 

public List<HtmlAgilityPack.HtmlNode> GetDivsInner() 
{ 
     HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument(); 

     doc.OptionFixNestedTags = true; 
     doc.Load(requestData("YOUR URL HERE")); 

     var divList = doc.DocumentNode.Descendants("div").Where(d => d.Attributes.Contains("id") && d.Attributes["id"].Value.Contains("YOUR ID VALUE")).ToList(); 

     return divList; 
} 

public StreamReader requestData(string url) 
{ 
     HttpWebRequest req = (HttpWebRequest)WebRequest.Create(url); 
     HttpWebResponse resp = (HttpWebResponse)req.GetResponse(); 

     StreamReader sr = new StreamReader(resp.GetResponseStream()); 

     return sr; 
}