Webscraping與BeautifulSoup在Python

resp = urlopen('http://international.o2.co.uk/internationaltariffs 
/getintlcallcosts?countryId=IND').read() 
crawler = bs4.BeautifulSoup(resp, 'html.parser') 
div = crawler.find('div', {"id": "standardRates"}) 
div

Webscraping與BeautifulSoup在Python

與上面的代碼它列出了所有的標籤/元素，你可以在圖片中看到。我想獲得「2.00英鎊」。除了當我再次調用.find（「TD」）如下：

div = crawler.find('div', {"id": "standardRates"}).find('td')

它只返回座機，而不是線下，即使它具有相同的標籤。我在網絡抓取方面的經驗很少。我如何定位這個標籤（2.00英鎊的行）？

來源

2017-02-24 Paulos

嘗試'的findAll（）'，而不是僅僅找'（）'' – MooingRawr

.findAll（ 'TD'）[1]'要準確 –

大奏效。雖然結果是一個列表，所以當我使用.contents時，字符串被方括號包圍。我可以得到它，所以它只是一個字符串？ – Paulos

您可以使用此方法相當直接去的£2.00以前的兄弟。

首先找到所需的表格，然後找到td並將其作爲Landline作爲字符串。然後得到這個td的父母，得到的這個下一個兄弟，終於得到了一個同級。

>>> import requests 
>>> get = requests.get('http://international.o2.co.uk/internationaltariffs/getintlcallcosts?countryId=IND') 
>>> page = get.text 
>>> from bs4 import BeautifulSoup 
>>> soup = BeautifulSoup(page,'lxml') 
>>> Landline_td = soup.find('table', {'id': 'standardRatesTable'}).find_all(string='Landline')[0] 
>>> Landline_td 
'Landline' 
>>> Landline_td.findParent().findNextSibling() 
<td>£2.00</td> 
>>> Landline_td.findParent().findNextSibling().text 
'£2.00'

來源

2017-02-24 17:13:01

Webscraping與BeautifulSoup在Python

回答

相關問題