0
所以我想抓取一些NBA數據。以下是我迄今爲止,它是完美的功能:使用rvest訪問html表格
install.packages('rvest')
library(rvest)
url = "https://www.basketball-reference.com/boxscores/201710180BOS.html"
webpage = read_html(url)
table = html_nodes(webpage, 'table')
data = html_table(table)
away = data[[1]]
home = data[[3]]
colnames(away) = away[1,] #set appropriate column names
colnames(home) = home[1,]
away = away[away$MP != "MP",] #remove rows that are just column names
home = home[home$MP != "MP",]
的問題是,這些表不包括球隊的名字,這是很重要的。爲了獲得這些信息,我想我會在網頁上刮掉四個因素表,但是,rvest似乎並不認爲這是一張表。包含四個因素表DIV的是:
<div class="overthrow table_container" id="div_four_factors">
並且表:
<table class="suppress_all sortable stats_table now_sortable" id="four_factors" data-cols-to-freeze="1"><thead><tr class="over_header thead">
這讓我覺得,我可以沿着
table = html_nodes(webpage,'#div_four_factors')
行通過一些訪問表
但這似乎不工作,因爲我只是得到一個空的列表。我怎樣才能訪問四個因素表?