從C＃win表格讀取html

我需要使用C＃win forms.so閱讀網站的標題。什麼是最好的方式來做它。我在谷歌搜索，但我沒有得到任何人。從C＃win表格讀取html

在此先感謝

來源

2011-03-05 Prabodha Eranga

如果你要做的整個頁面解析，那麼你可以嘗試HTML Agility pack。如果你需要的只是標題，那麼一些正則表達式就可以做到。

由於大部分時間標題都位於<title>標記中，因此您可以直接提取該標記。

用於下載HTML，那麼你可以使用WebClient或HttpRequest /響應對象

來源

2011-03-05 15:17:05

thnx爲你快速回復。 – 2011-03-05 15:18:53

個人而言，我喜歡和使用SgmlReader解析HTML：

using System; 
using System.IO; 
using System.Net; 
using System.Xml; 
using Sgml; 

class Program 
{ 
    static void Main() 
    { 
     var url = "http://www.stackoverflow.com"; 
     using (var reader = new SgmlReader()) 
     using (var client = new WebClient()) 
     using (var streamReader = new StreamReader(client.OpenRead(url))) 
     { 
      reader.DocType = "HTML"; 
      reader.WhitespaceHandling = WhitespaceHandling.All; 
      reader.CaseFolding = Sgml.CaseFolding.ToLower; 
      reader.InputStream = streamReader; 

      var doc = new XmlDocument(); 
      doc.PreserveWhitespace = true; 
      doc.XmlResolver = null; 
      doc.Load(reader); 
      var title = doc.SelectSingleNode("//title"); 
      if (title != null) 
      { 
       Console.WriteLine(title.InnerText); 
      } 
     } 
    } 
}

來源

2011-03-05 15:23:44

你想用在發現WebClient的對象System.Net.WebClient命名空間。

using System.Net;

使用WebClient，你可以下載整個網站作爲一個字符串，然後做任何你想要的字符串。 :)

例子：

WebClient client = new WebClient(); 
string content = wc.DownloadString("http://www.google.com");

然後，只需解析字符串反正你想要它。 :)在這個例子中，你可能希望找到標題元素，提取標題是這樣的：

string title = content.Substring(content.IndexOf("<title>"), content.IndexOf("</title>") - content.IndexOf("<title>")).Replace("<title>", "").Trim();

希望它能幫助。 :)

來源

2011-03-05 15:27:32 peterthegreat

Thanku你@ peterthegreat.This這個片段真的幫助我。提前感謝 – 2011-03-05 15:36:07

很高興我能幫上忙。 :) – peterthegreat 2011-03-05 17:02:33

從C＃win表格讀取html

回答

相關問題