Web刮一個目錄w/BeautifulSoup以外的一個開放的分類器

我想從名稱使用BeautifulSoup，但html格式化的方式使名稱困難，使我很難。這裏是出了名的目錄中的一個例子：我不是很有經驗的HTMLWeb刮一個目錄w/BeautifulSoup以外的一個開放的分類器

<li><span class="image-wrapper-outer"><span class="image-wrapper-inner"><img src="/directory/images/1234.jpg" alt="student photo"/></span></span><strong>Name:</strong> Alex Example<br/> 
    <strong>Email:</strong> <a href="mailto:[email protected]">[email protected]</a><br/> 
    <strong>Year:</strong> 2017<br/> 
    <strong>Box #:</strong> 123<br/> 
    <strong>Local phone:</strong> 1234<br/> 
    <strong>Home Info:</strong> 7033 Fake St.<br/>Chicago NY 90210 <br/> 
    <strong>Advisors:</strong> Advisor1, Advisor2<br/><br/></li>

，但我無法找到一個開放的「名李四的名字/」，也就是揹着我想刮信息。

這裏是我現有的代碼：

def makeSoup(url): 
    r = requests.get(url) 
    data = r.text 
    soup = BeautifulSoup(data) 
    return soup 

for i in range(0,1): 
    souptemp = makeSoup(url_list[i]) 
    for link in souptemp.find_all('need help here'): 
     print link

今天謝謝你的幫助。

來源

2014-12-13 CJ B

您可以刪除強標籤並通過線分割文本檢索名稱：

soup = BeautifulSoup(data) 

[s.extract() for s in soup.find_all('strong') 
print soup.text.split('\n')[0]

來源

2014-12-13 23:18:03 kkanellis

Web刮一個目錄w/BeautifulSoup以外的一個開放的分類器

回答

相關問題