2017-03-16 70 views
1

所以我試圖從ESPN上刮一個NBA比賽的盒子比分。我試圖首先獲得名稱,但是我很難擺脫html標籤。抓取Python中的html標籤時抓取

我使用

get_text(), .text(), .string_strip() 

嘗試,但他們不斷給我的錯誤。

下面是我正在使用的代碼。

from bs4 import BeautifulSoup 
import requests 

url= "http://scores.espn.com/nba/boxscore?gameId=400900407" 
r = requests.get(url) 
soup = BeautifulSoup(r.text,"html.parser") 

name = [] 
for row in soup.find_all('tr')[1:]: 
     player_name = row.find('td', attrs={'class': 'name'}) 
     name.append(player_name) 
print(name) 
+0

你說的錯誤。什麼錯誤? –

回答

3

使用player_name.text應該工作,但問題是,有時row.find('td', attrs={'class': 'name'}是空的。試試這樣:

if player_name: 
    name.append(player_name.text) 
+0

這工作!謝謝 – jhaywoo8

2

我解決這個問題這樣的:

from bs4 import BeautifulSoup 
import requests 

url= "http://scores.espn.com/nba/boxscore?gameId=400900407" 
r = requests.get(url) 
soup = BeautifulSoup(r.text,"html.parser") 

name = [] 
for row in soup.find_all('tr')[1:]: 
    try: 
     player_name = row.select('td.name span')[0].text 
     name.append(player_name) 
    except: 
     pass 
print(name) 
1

我的代碼,供大家參考

import requests 

from pyquery import PyQuery as pyq 

url= "http://scores.espn.com/nba/boxscore?gameId=400900407" 
r = requests.get(url) 
doc = pyq(r.content) 
print([h.text() for h in doc('.abbr').items()])