2016-08-02 43 views
1

我需要幫助從這個頁面提取html表格。我試圖查詢不同類型的設備,他們有不同的領域。我想要一種方法來提取列標題作爲對象名稱和數據作爲值,無論它找到哪些表。這很好,所有的表都只有兩個表格行。從多個表中解析Powershell HTML

我試過從http://poshcode.org/4849使用convertfrom-html,但沒有幫助我。

我遇到的一個大問題是,當我做一個invoke-webrequest時,沒有parsedhtml,所以我不能通過ID進行搜索。

$url = 'http://ipaddressofdevice/status.htm' 
$r = Invoke-WebRequest $url 

這是$ R

StatusCode  : 200 
StatusDescription : OK 
Content   : {60, 104, 116, 109...} 
RawContent  : HTTP/1.0 200 OK 
        Tue, 02 Aug 2016 01: 35:59 GMT 
        Context-Type: text/html 

        <html> 
        <head> 
        <title><center><b>Controller Status</b></title> 
        <meta http-equiv="Content-Type" content="text/html; charset... 
Headers   : {[Tue, 02 Aug 2016 01, 35:59 GMT], [Context-Type, text/html]} 
RawContentLength : 4443 

這是$ r.RawContent

HTTP/1.0 200 OK 
Tue, 02 Aug 2016 01: 35:59 GMT 
Context-Type: text/html 

<html> 
<head> 
<title><center><b>Controller Status</b></title> 
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> 
<link rel="stylesheet" href="font_styles.css"> 
</head> 
<body bgcolor="#FFFFFF"><center> 
<table CELLSPACING=4 CELLPADDING=4 WIDTH="90%" bordercolorlight="#CCFFFF" bordercolordark="#003366"><tr> 
<td colspan="2" class="intro_18"> 
<br><center>Controller Status<br><center><table BORDER=2 CELLSPACING=0 CELLPADDING=3 COLS=3 WIDTH="90%" bordercolorlight="#CCFFFF" bordercolordark="#003366"> 
<br><tr class="listhead_0" BGCOLOR="#009999"><th>Controller Type</th><th>Controller Name</th><th>Online</th><tr><tr><td class="listdata_1">Master Controller</td><td class="listdata_1">somename</td> 
<td class="listdata_1">Yes</td></tr></table></td></tr><tr><td><center><table BORDER=2 CELLSPACING=0 CELLPADDING=3 COLS=3 WIDTH="90%" bordercolorlight="#CCFFFF" bordercolordark="#003366"> 
<tr class="listhead_0" BGCOLOR="#009999"><th>Main Image</th><th>Boot Image</th><th>Bootloader</th><th>Processor</th><th>Board</th><tr><td class="listdata_1">5.2.A.19813.i2</td> 
<td class="listdata_1">5.0.4.17504.BOOT.i2</td><td class="listdata_1">2.0.35</td><td class="listdata_1">MPC860 D4</td><td class="listdata_1">II</td></tr></table></td></tr><tr><td> 
<center><table BORDER=2 CELLSPACING=0 CELLPADDING=3 COLS=4 WIDTH="90%" bordercolorlight="#CCFFFF" bordercolordark="#003366"> 
<tr class="listhead_0" BGCOLOR="#009999"><th>MAC Address</th><th>IP Address</th><th>Host IP Address</th> 
</tr><tr class="listdata_1"><td class="listdata_1">010bdc</td> 
</td><td class="listdata_1">172.0.0.1</td><td class="listdata_1">hostname.com</td></tr></table></td></tr><tr><td> 
<center><table BORDER=2 CELLSPACING=0 CELLPADDING=3 COLS=4 WIDTH="90%" bordercolorlight="#CCFFFF" bordercolordark="#003366"> 
<tr class="listhead_0" BGCOLOR="#009999"><th>Local Date/Time</th> 
<th>GMT Date/Time</th> 
<th>DST</th> 
<th>Boot Date/Time</th> 
<th>Elapsed Time Since Boot</th> 
</tr><td class="listdata_1">Tue Aug 2 7: 5:59 2016 
India Standard Time</td> 
<td class="listdata_1">Tue Aug 2 1:35:59 2016 
</td> 
<td class="listdata_1">No</td><td class="listdata_1">Fri Jul 15 23:30:13 2016 
</td> 
<td class="listdata_1">17 days 2 hours 5 minutes 46 seconds</td> 
</table></td></tr><tr><td> 
<center><table BORDER=2 CELLSPACING=0 CELLPADDING=3 COLS=3 WIDTH="90%" bordercolorlight="#CCFFFF" bordercolordark="#003366"> 
<tr class="listhead_0" BGCOLOR="#009999"><th>Total Program Memory</th><th>Free Program Memory</th><th>Percent Free</th></tr><tr><td class="listdata_1">15425536</td> 
<td class="listdata_1">6197248</td> 
<td class="listdata_1">40.18 %</td> 
</tr> 
</table> 
</td></tr><tr><td> 
<center><table BORDER=2 CELLSPACING=0 CELLPADDING=3 COLS=3 WIDTH="90%" bordercolorlight="#CCFFFF" bordercolordark="#003366"> 
<tr class="listhead_0" BGCOLOR="#009999"><th>Total Storage Memory</th><th>Free Storage Memory</th><th>Total Physical Memory</th></tr><tr><td class="listdata_1">50819072</td> 
<td class="listdata_1">45514064</td> 
<td class="listdata_1">64 Meg</td> 
</tr> 
</table> 
</td></tr><tr><td> 
<center><table BORDER=2 CELLSPACING=0 CELLPADDING=3 COLS=2 WIDTH="90%" bordercolorlight="#CCFFFF" bordercolordark="#003366"> 
<tr class="listhead_0" BGCOLOR="#009999"><th>Host Connection Status</th> 
<th>Path To Host</th></tr><tr><td class="listdata_1">Host Connection Established</td> 
<td class="listdata_1">Yes</td></tr> 
</table> 
</td></tr><tr><td> 
<center><table BORDER=2 CELLSPACING=0 CELLPADDING=3 COLS=2 WIDTH="90%" bordercolorlight="#CCFFFF" bordercolordark="#003366"> 
<tr class="listhead_0" BGCOLOR="#009999"><th>Active Communication Type</th><th>Secondary Communication Type</th> 
</tr><td class="listdata_1">Ethernet</td> 
<td class="listdata_1">N/A</td> 
</tr> 
</table> 
</td></tr><tr><td> 
<center><table BORDER=2 CELLSPACING=0 CELLPADDING=3 COLS=3 WIDTH="90%" bordercolorlight="#CCFFFF" bordercolordark="#003366"> 
<tr class="listhead_0" BGCOLOR="#009999"><th>PCMCIA Ethernet Card Address</th><th>Modem</th><th>USB Security Key</th></tr><td class="listdata_1">N/A</td><td class="listdata_1">N/A</td> 
<td class="listdata_1">N/A</td></tr> 
</table> 
</td></tr> 
</table> 
<p><font face="Verdana, Trebuchet MS, Tahoma, Arial, sans-serif" size="1" color="#003366"> 
Copyright ? 2008 Tyco International Ltd. and its Respective Companies. All Rights Reserved</font></p> 
</body> 
</html> 

回答

0

如果你的HTML頁面總是這個樣子,你可以這樣做:

#$raw - is your $r.RawContent from Web-Request 
$raw = @" 
HTTP/1.0 200 OK 
Tue, 02 Aug 2016 01: 35:59 GMT 
Context-Type: text/html 

<html> 
<head> 
<title><center><b>Controller Status</b></title> 
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> 
<link rel="stylesheet" href="font_styles.css"> 
</head> 
<body bgcolor="#FFFFFF"><center> 
<table CELLSPACING=4 CELLPADDING=4 WIDTH="90%" bordercolorlight="#CCFFFF" bordercolordark="#003366"><tr> 
<td colspan="2" class="intro_18"> 
<br><center>Controller Status<br><center><table BORDER=2 CELLSPACING=0 CELLPADDING=3 COLS=3 WIDTH="90%" bordercolorlight="#CCFFFF" bordercolordark="#003366"> 
<br><tr class="listhead_0" BGCOLOR="#009999"><th>Controller Type</th><th>Controller Name</th><th>Online</th><tr><tr><td class="listdata_1">Master Controller</td><td class="listdata_1">somename</td> 
<td class="listdata_1">Yes</td></tr></table></td></tr><tr><td><center><table BORDER=2 CELLSPACING=0 CELLPADDING=3 COLS=3 WIDTH="90%" bordercolorlight="#CCFFFF" bordercolordark="#003366"> 
<tr class="listhead_0" BGCOLOR="#009999"><th>Main Image</th><th>Boot Image</th><th>Bootloader</th><th>Processor</th><th>Board</th><tr><td class="listdata_1">5.2.A.19813.i2</td> 
<td class="listdata_1">5.0.4.17504.BOOT.i2</td><td class="listdata_1">2.0.35</td><td class="listdata_1">MPC860 D4</td><td class="listdata_1">II</td></tr></table></td></tr><tr><td> 
<center><table BORDER=2 CELLSPACING=0 CELLPADDING=3 COLS=4 WIDTH="90%" bordercolorlight="#CCFFFF" bordercolordark="#003366"> 
<tr class="listhead_0" BGCOLOR="#009999"><th>MAC Address</th><th>IP Address</th><th>Host IP Address</th> 
</tr><tr class="listdata_1"><td class="listdata_1">010bdc</td> 
</td><td class="listdata_1">172.0.0.1</td><td class="listdata_1">hostname.com</td></tr></table></td></tr><tr><td> 
<center><table BORDER=2 CELLSPACING=0 CELLPADDING=3 COLS=4 WIDTH="90%" bordercolorlight="#CCFFFF" bordercolordark="#003366"> 
<tr class="listhead_0" BGCOLOR="#009999"><th>Local Date/Time</th> 
<th>GMT Date/Time</th> 
<th>DST</th> 
<th>Boot Date/Time</th> 
<th>Elapsed Time Since Boot</th> 
</tr><td class="listdata_1">Tue Aug 2 7: 5:59 2016 
India Standard Time</td> 
<td class="listdata_1">Tue Aug 2 1:35:59 2016 
</td> 
<td class="listdata_1">No</td><td class="listdata_1">Fri Jul 15 23:30:13 2016 
</td> 
<td class="listdata_1">17 days 2 hours 5 minutes 46 seconds</td> 
</table></td></tr><tr><td> 
<center><table BORDER=2 CELLSPACING=0 CELLPADDING=3 COLS=3 WIDTH="90%" bordercolorlight="#CCFFFF" bordercolordark="#003366"> 
<tr class="listhead_0" BGCOLOR="#009999"><th>Total Program Memory</th><th>Free Program Memory</th><th>Percent Free</th></tr><tr><td class="listdata_1">15425536</td> 
<td class="listdata_1">6197248</td> 
<td class="listdata_1">40.18 %</td> 
</tr> 
</table> 
</td></tr><tr><td> 
<center><table BORDER=2 CELLSPACING=0 CELLPADDING=3 COLS=3 WIDTH="90%" bordercolorlight="#CCFFFF" bordercolordark="#003366"> 
<tr class="listhead_0" BGCOLOR="#009999"><th>Total Storage Memory</th><th>Free Storage Memory</th><th>Total Physical Memory</th></tr><tr><td class="listdata_1">50819072</td> 
<td class="listdata_1">45514064</td> 
<td class="listdata_1">64 Meg</td> 
</tr> 
</table> 
</td></tr><tr><td> 
<center><table BORDER=2 CELLSPACING=0 CELLPADDING=3 COLS=2 WIDTH="90%" bordercolorlight="#CCFFFF" bordercolordark="#003366"> 
<tr class="listhead_0" BGCOLOR="#009999"><th>Host Connection Status</th> 
<th>Path To Host</th></tr><tr><td class="listdata_1">Host Connection Established</td> 
<td class="listdata_1">Yes</td></tr> 
</table> 
</td></tr><tr><td> 
<center><table BORDER=2 CELLSPACING=0 CELLPADDING=3 COLS=2 WIDTH="90%" bordercolorlight="#CCFFFF" bordercolordark="#003366"> 
<tr class="listhead_0" BGCOLOR="#009999"><th>Active Communication Type</th><th>Secondary Communication Type</th> 
</tr><td class="listdata_1">Ethernet</td> 
<td class="listdata_1">N/A</td> 
</tr> 
</table> 
</td></tr><tr><td> 
<center><table BORDER=2 CELLSPACING=0 CELLPADDING=3 COLS=3 WIDTH="90%" bordercolorlight="#CCFFFF" bordercolordark="#003366"> 
<tr class="listhead_0" BGCOLOR="#009999"><th>PCMCIA Ethernet Card Address</th><th>Modem</th><th>USB Security Key</th></tr><td class="listdata_1">N/A</td><td class="listdata_1">N/A</td> 
<td class="listdata_1">N/A</td></tr> 
</table> 
</td></tr> 
</table> 
<p><font face="Verdana, Trebuchet MS, Tahoma, Arial, sans-serif" size="1" color="#003366"> 
Copyright ? 2008 Tyco International Ltd. and its Respective Companies. All Rights Reserved</font></p> 
</body> 
</html> 
"@ 

$obj = New-Object PSObject 
for ($i=0;$i -lt $raw.Length-8;){ 

    $table = $raw.Remove(0,$i) 
    $i += $table.IndexOf('</table>') + 8 
    $table = $table.Remove($table.IndexOf('</table>') + 8) 
    $columns = $table -split "`n" -join '' -split '<th>' | 
        ? {$_ -like '*</th>*'} | 
         % {$_.Remove($_.IndexOf('</th>'))} | 
          ? {$_ -and $_ -ne [Char]13} 
    $values = $table -split "`n" -join '' -split '<td class' | 
        ? {$_ -like '*</td>*'} | 
         % {$_.Remove($_.IndexOf('</td>')) -replace '.*">'} | 
          ? {$_ -and $_ -ne [Char]13} 
    $o=0 
    $columns | % { 
     $column = $_ 
     $obj | Add-Member -MemberType NoteProperty -Name $column -Value $values[$o] 
     $o++ 
    } 
} 
$obj 

輸出

Controller Type    : Master Controller 
Controller Name    : somename 
Online      : Yes 
Main Image     : 5.2.A.19813.i2 
Boot Image     : 5.0.4.17504.BOOT.i2 
Bootloader     : 2.0.35 
Processor     : MPC860 D4 
Board      : II 
MAC Address     : 010bdc 
IP Address     : 172.0.0.1 
Host IP Address    : hostname.com 
Local Date/Time   : Tue Aug 2 7: 5:59 2016 India Standard Time 
GMT Date/Time    : Tue Aug 2 1:35:59 2016 
DST       : No 
Boot Date/Time    : Fri Jul 15 23:30:13 2016 
Elapsed Time Since Boot  : 17 days 2 hours 5 minutes 46 seconds 
Total Program Memory   : 15425536 
Free Program Memory   : 6197248 
Percent Free     : 40.18 % 
Total Storage Memory   : 50819072 
Free Storage Memory   : 45514064 
Total Physical Memory  : 64 Meg 
Host Connection Status  : Host Connection Established 
Path To Host     : Yes 
Active Communication Type : Ethernet 
Secondary Communication Type : N/A 
PCMCIA Ethernet Card Address : N/A 
Modem      : N/A 
USB Security Key    : N/A 
+0

哇感謝!這將對我有用! – user3839452

+0

你爲什麼使用 - 8? – user3839452

+0

8長度爲'' –

0

你有使用HTMLAgilityPack考慮?它允許您通過ID和其他內容查找元素。

# Note: You'll need a valid path to the assembly 
Add-Type -Path HtmlAgilityPack.dll 

$htmlDocument = New-Object HtmlAgilityPack.HtmlDocument 
$htmlDocument.LoadHtml($r.RawContent) 

# This assumes the element has an ID tag. There are other ways to traverse 
# content if not. 
$htmlDocument.GetElementById('...') 
+0

元素不具備的id – user3839452