2015-02-06 94 views

回答

1

嘗試使用正則表達式

import re 
re.findall(r'(?i)href=["\']([^\s"\'<>]+)', content) 
2

使用的HTML解析器

BeautifulSoup情況下,你可以通過一個function作爲關鍵字參數值:

from bs4 import BeautifulSoup 

word = "test" 
data = "your HTML here" 
soup = BeautifulSoup(data) 

for a in soup.find_all('a', href=lambda x: x and word in x): 
    print(a['href']) 

或者,regular expression

import re 

for a in soup.find_all('a', href=re.compile(word)): 
    print(a['href']) 

或者,使用CSS selector

for a in soup.select('a[href^="{word}"]'.format(word=word)): 
    print(a['href'])