如何使用RegEx從HTML中提取某些數據？

我有下面的代碼：如何使用RegEx從HTML中提取某些數據？

<tr class="even"> 
      <td> 
       Title1 
      </td> 
      <td> 
       Name1 
      </td> 
      <td> 
       Email1 
      </td> 
      <td> 
       Postcode1 
      </td>

我想用正則表達式的輸出標記之間的數據，像這樣：

標題1 名1 EMAIL1 Postcode1 標題2 名稱2 電子郵件2 郵編2 ...

來源

2014-09-03 Hoyesic

（http://stackoverflow.com/a/1732454/102937 ） – 2014-09-03 15:21:16

您不應該使用正則表達式來解析html，請使用HTML解析器inste廣告。

無論如何，如果你真的想要一個正則表達式，你可以用這一個：

>\s+<|>\s*(.*?)\s*<

Working demo

enter image description here 比賽信息：

MATCH 1 
1. [51-57] `Title1` 
MATCH 2 
1. [109-114] `Name1` 
MATCH 3 
1. [166-172] `Email1` 
MATCH 4 
1. [224-233] `Postcode1`

來源

2014-09-03 15:23:53

這應該擺脫一切之間的標籤，並輸出其餘的空間分開：

$text = 
@' 
<tr class="even"> 
      <td> 
       Title1 
      </td> 
      <td> 
       Name1 
      </td> 
      <td> 
       Email1 
      </td> 
      <td> 
       Postcode1 
      </td> 
'@ 

$text -split '\s*<.+?>\s*' -match '\S' -as [string] 

Title1 Name1 Email1 Postcode1

來源

2014-09-03 15:53:08 mjolinor

Don't use a regex. HTML不是一種常規的語言，所以它不能用正則表達式正確解析。它大部分時間都會成功，但其他時間將會失敗。壯觀。

使用Internet Explorer COM對象從文件中讀取你的HTML：[？我敢說]

$ie = new-object -com "InternetExplorer.Application" 
$ie.visible = $false 
$ie.navigate("F:\BuildOutput\rt.html") 
$document = $ie.Document 
# This will return all the tables 
$document.getElementsByTagName('table') 

# This will return a table with a specific ID 
$document.getElementById('employees')

Here's the MSDN reference for the document class.

來源

2014-09-04 01:38:55

如何使用RegEx從HTML中提取某些數據？

回答

相關問題