2015-10-05 108 views
0

查看頁面的頁面源代碼時,我使用CTRL-F查找所有出現的「id =」,從而得到82個結果。我想要做的是隻提取「id =」後面的數字。例如,如果屬性是id=344,那麼我只想將344作爲字符串並將其添加到列表中。如何使用HtmlAgilityPack提取特定HTML文本的部分內容?

我現在做它的方式我沒有變,我想我會得到所有的鏈接這樣的,以後它使過濾器的聯繫,但我得到空字符串和我想要一些文本無關。我想做InnerText是錯誤的。

Source View

idsnumbers = new List<string>(); 
HtmlWeb hw = new HtmlWeb(); 
HtmlAgilityPack.HtmlDocument doc = hw.Load("http://www.tapuz.co.il/forums2008/"); 
foreach (HtmlNode link in doc.DocumentNode.SelectNodes("//a[@href]")) 
{ 
    idsnumbers.Add(link.InnerText); 
} 

更新越來越空例外:

System.NullReferenceException was unhandled 
    _HResult=-2147467261 
    _message=Object reference not set to an instance of an object. 
    HResult=-2147467261 
    IsTransient=false 
    Message=Object reference not set to an instance of an object. 
    Source=WindowsFormsApplication1 
    StackTrace: 
     at WindowsFormsApplication1.Form1..ctor() in d:\C-Sharp\Tapuz Images\WindowsFormsApplication1\WindowsFormsApplication1\Form1.cs:line 50 
     at WindowsFormsApplication1.Program.Main() in d:\C-Sharp\Tapuz Images\WindowsFormsApplication1\WindowsFormsApplication1\Program.cs:line 19 
     at System.AppDomain._nExecuteAssembly(RuntimeAssembly assembly, String[] args) 
     at System.AppDomain.ExecuteAssembly(String assemblyFile, Evidence assemblySecurity, String[] args) 
     at Microsoft.VisualStudio.HostingProcess.HostProc.RunUsersAssembly() 
     at System.Threading.ThreadHelper.ThreadStart_Context(Object state) 
     at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx) 
     at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx) 
     at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state) 
     at System.Threading.ThreadHelper.ThreadStart() 
    InnerException: 

回答

1

您應該從屬性讀取的ID。 InnerText僅供文本內的標籤,位於開合托架之間。所以:

foreach (HtmlNode link in doc.DocumentNode.SelectNodes("//a[@href]")) 
{ 
    idsnumbers.Add(link.Attributes["id"].Value); 
} 

如果你想從IDS中進一步提取只有數字,你可以使用RegExint.TryParse

+0

torvin即時上線得到異常空:idsnumbers.Add(link.Attributes [ 「ID」]值。);我向我的問題添加了異常完整消息。 –

+0

如果'link.Attributes [「id」]'爲空,那麼你的''沒有它。只需添加一個空檢查。 – torvin

相關問題