使用正則表達式查找多個事件

是否可以使用一個正則表達式捕獲href中的所有信息？使用正則表達式查找多個事件

例如：

<div id="w1"> 
    <ul id="u1"> 
     <li><a id='1' href='book'>book<sup>1</sup></a></li> 
     <li><a id='2' href='book-2'>book<sup>2</sup></a></li> 
     <li><a id='3' href='book-3'>book<sup>3</sup></a></li> 
    </ul> 
</div>

我想book，book-2和book-3。

來源

2014-04-24 Hello_World

你是否設法找到一個？發佈你的代碼，以便我們能夠找出問題所在。 – devnull

用regexp解析html不是正確的方法。使用像lxml或beautifulsoup這樣的html解析器。 – Daniel

短而簡單：

html = '<div id="w1"><ul id="u1"><li><a id='1' href='book'>book<sup>1</sup></a></li><li><a id='2' href='book-2'>book<sup>2</sup></a></li><li><a id='3' href='book-3'>book<sup>3</sup></a></li></ul></div>' 
result = re.findall("href='(.*?)'", html)

說明：

Match the character string 「href='」 literally (case sensitive) «href='» 
Match the regex below and capture its match into backreference number 1 «(.*?)» 
    Match any single character that is NOT a line break character (line feed) «.*?» 
     Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?» 
Match the character 「'」 literally «'»

來源

2014-04-24 08:54:47

你可以做到這一點與以下regex：

<a id='\d+' href='([\w-]+)' 

import re 

s = '''<div id="w1"><ul id="u1"><li><a id='1' href='book'>book<sup>1</sup></a></li><li><a id='2' href='book-2'>book<sup>2</sup></a></li><li><a id='3' href='book-3'>book<sup>3</sup></a></li></ul></div>''' 

>>> print re.findall(r"<a id='\d+' href='([\w-]+)'", s) 
['book', 'book-2', 'book-3']

來源

2014-04-24 08:55:28 sshashank124

使用自定義類擴展HTMLParser：

class MyHTMLParser(HTMLParser): 
    def __init__(self,*args,**kw): 
     super().__init__(*args,**kw) 
      self.anchorlist=[] 

    def handle_starttag(self,tag,attrs): 
     if tag == 'a': 
      for attribute in attrs: 
       if attribute[0] == 'href': 
        self.anchorlist.append(attribute[1])

這將把所有的URL放在anchorlist。

順便說一下，它在Python 3.x

來源

2014-04-24 11:42:43 HFX

使用正則表達式查找多個事件

回答

相關問題