考慮下面的HTML塊,這將是最好的正則表達式模式創建以下列表:(保持在匹配集的URL鏈接正則表達式退出HTML項目
Abdominal Aortic Aneurysm see Aortic Aneurysm
Abdominal Pain
Abdominal Pregnancy see Ectopic Pregnancy
Abnormalities see Birth Defects
ABO Blood Groups see Blood and Blood Disorders
Abortion
About Your Medicines see Medicines; Over-the-Counter Medicines
ABPA see Aspergillosis
Abscess
Abuse see Child Abuse; Domestic Violence; Elder Abuse
這裏是原始輸入:
<li><span class="formod5"> </span></li>
<li class="item">Abdominal Aortic Aneurysm see <a href="http://www.nlm.nih.gov/medlineplus/aorticaneurysm.html">Aortic Aneurysm</a></li>
<li class="item"><a href="http://www.nlm.nih.gov/medlineplus/abdominalpain.html">Abdominal Pain</a></li>
<li class="item">Abdominal Pregnancy see <a href="http://www.nlm.nih.gov/medlineplus/ectopicpregnancy.html">Ectopic Pregnancy</a></li>
<li class="item">Abnormalities see <a href="http://www.nlm.nih.gov/medlineplus/birthdefects.html">Birth Defects</a></li>
<li class="item">ABO Blood Groups see <a href="http://www.nlm.nih.gov/medlineplus/bloodandblooddisorders.html">Blood and Blood Disorders</a></li>
<li><span class="formod5"> </span></li>
<li class="item"><a href="http://www.nlm.nih.gov/medlineplus/abortion.html">Abortion</a></li>
<li class="item">About Your Medicines see <a href="http://www.nlm.nih.gov/medlineplus/medicines.html">Medicines</a>; <a href="http://www.nlm.nih.gov/medlineplus/overthecountermedicines.html">Over-the-Counter Medicines</a></li>
<li class="item">ABPA see <a href="http://www.nlm.nih.gov/medlineplus/aspergillosis.html">Aspergillosis</a></li>
<li class="item"><a href="http://www.nlm.nih.gov/medlineplus/abscess.html">Abscess</a></li>
<li class="item">Abuse see <a href="http://www.nlm.nih.gov/medlineplus/childabuse.html">Child Abuse</a>; <a href="http://www.nlm.nih.gov/medlineplus/domesticviolence.html">Domestic Violence</a>; <a href="http://www.nlm.nih.gov/medlineplus/elderabuse.html">Elder Abuse</a></li>
<li><span class="formod5"> </span></li>
TIA
最好的正則表達式是'/(.*)/ m',然後用一個HTML解析器來完成剩下的工作。 –
[你不應該嘗試用正則表達式解析HTML](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454) – Bohemian
看到RegEx濫用。 ;) – TrueWill