我明白用正則表達式解析html並不理想,但我有一個用例。正則表達式 - 如何正確地抓取嵌套值
我有這樣的覆蓋報告/ html頁面:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>LCOV - .info.cleaned</title>
<link rel="stylesheet" type="text/css" href="gcov.css">
</head>
<body>
<table width="100%" border=0 cellspacing=0 cellpadding=0>
<tr><td class="title">LCOV - code coverage report</td></tr>
<tr><td class="ruler"><img src="glass.png" width=3 height=3 alt=""></td></tr>
<tr>
<td width="100%">
<table cellpadding=1 border=0 width="100%">
<tr>
<td width="10%" class="headerItem">Current view:</td>
<td width="35%" class="headerValue">top level</td>
<td width="5%"></td>
<td width="15%"></td>
<td width="10%" class="headerCovTableHead">Hit</td>
<td width="10%" class="headerCovTableHead">Total</td>
<td width="15%" class="headerCovTableHead">Coverage</td>
</tr>
<tr>
<td class="headerItem">Test:</td>
<td class="headerValue">.info.cleaned</td>
<td></td>
<td class="headerItem">Lines:</td>
<td class="headerCovTableEntry">399</td>
<td class="headerCovTableEntry">1019</td>
<td class="headerCovTableEntryLo">39.2 %</td>
</tr>
<tr>
<td class="headerItem">Date:</td>
<td class="headerValue">2016-11-07</td>
<td></td>
<td class="headerItem">Functions:</td>
<td class="headerCovTableEntry">22</td>
<td class="headerCovTableEntry">67</td>
<td class="headerCovTableEntryLo">32.8 %</td>
</tr>
<tr><td><img src="glass.png" width=3 height=3 alt=""></td></tr>
</table>
</td>
</tr>
<tr><td class="ruler"><img src="glass.png" width=3 height=3 alt=""></td></tr>
</table>
<center>
<table width="80%" cellpadding=1 cellspacing=1 border=0>
<tr>
<td width="50%"><br></td>
<td width="10%"></td>
<td width="10%"></td>
<td width="10%"></td>
<td width="10%"></td>
<td width="10%"></td>
</tr>
<tr>
<td class="tableHead">Directory <span class="tableHeadSort"><img src="glass.png" width=10 height=14 alt="Sort by name" title="Sort by name" border=0></span></td>
<td class="tableHead" colspan=3>Line Coverage <span class="tableHeadSort"><a href="index-sort-l.html"><img src="updown.png" width=10 height=14 alt="Sort by line coverage" title="Sort by line coverage" border=0></a></span></td>
<td class="tableHead" colspan=2>Functions <span class="tableHeadSort"><a href="index-sort-f.html"><img src="updown.png" width=10 height=14 alt="Sort by function coverage" title="Sort by function coverage" border=0></a></span></td>
</tr>
<tr>
<td class="coverFile"><a href="src/index.html">src</a></td>
<td class="coverBar" align="center">
<table border=0 cellspacing=0 cellpadding=1><tr><td class="coverBarOutline"><img src="ruby.png" width=39 height=10 alt="39.2%"><img src="snow.png" width=61 height=10 alt="39.2%"></td></tr></table>
</td>
<td class="coverPerLo">39.2 %</td>
<td class="coverNumLo">399/1019</td>
<td class="coverPerLo">32.8 %</td>
<td class="coverNumLo">22/67</td>
</tr>
</table>
</center>
<br>
<table width="100%" border=0 cellspacing=0 cellpadding=0>
<tr><td class="ruler"><img src="glass.png" width=3 height=3 alt=""></td></tr>
<tr><td class="versionInfo">Generated by: <a href="http://ltp.sourceforge.net/coverage/lcov.php">LCOV version 1.10</a></td></tr>
</table>
<br>
</body>
</html>
我試圖從該行解析出數據:
<td class="headerCovTableEntryLo">39.2 %</td>
爲39.2(浮點值)。
我目前使用這個正則表達式來找到兩個匹配TD的:
<td class="headerCovTableEntryLo">[0-9.].*?.%<\/td>
我誤解組的工作。我想:
(<td class="headerCovTableEntryLo">[0-9.].*?.%<\/td>)[0-9.].*?\1
採取什麼是在第一組中發現和抓住只是數字的值,但我有零個匹配。任何人都可以借鑑一些我做錯了什麼?
哪您使用的語言/工具是? –
'正則表達式 - 如何正確地抓取嵌套值?'...不要使用正則表達式,使用HTML解析器。 –
謝謝你們兩位......我知道HTML解析器會是首選,而我在rails中。不幸的是,在我工作的系統/環境下,這並不容易。 – isuPatches