在R中使用`rvest`使用`read_html`時缺少元素

我試圖在rvest包中使用read_html函數，但遇到了一個我正在努力解決的問題。在R中使用`rvest`使用`read_html`時缺少元素

例如，如果我試圖在上this頁出現在底部表中讀取，我會用下面的代碼：

library(rvest) 
html_content <- read_html("https://projects.fivethirtyeight.com/2016-election-forecast/washington/#now")

通過在瀏覽器中檢查HTML代碼，我可以看到，我想要的內容包含在<table>標籤中（具體來說，它全部包含在<table class="t-calc">之內）。但是，當我嘗試提取此使用：

tables <- html_nodes(html_content, xpath = '//table')

我檢索如下：

> tables 
{xml_nodeset (4)} 
[1] <table class="tippingpointroi unexpanded">\n <tbody>\n <tr data-state="FL" class=" "> ... 
[2] <table class="tippingpointroi unexpanded">\n <tbody>\n <tr data-state="NV" class=" "> ... 
[3] <table class="scenarios">\n <tbody/>\n <tr data-id="1">\n <td class="description">El ... 
[4] <table class="t-desktop t-polls">\n <thead>\n <tr class="th-row">\n  <th class="t ...

其中包括一些頁面上的表元素，但不是一個我感興趣的

關於我哪裏出錯的任何建議將非常感謝！

來源

2016-08-31 user2728808

該表是從頁面本身的JavaScript變量中的數據動態構建的。要麼使用RSelenium抓取網頁的文本它的渲染後的頁面進入rvest或使用V8抓住所有的數據寶庫：

library(rvest) 
library(V8) 

URL <- "http://projects.fivethirtyeight.com/2016-election-forecast/washington/#now" 

pg <- read_html(URL) 

js <- html_nodes(pg, xpath=".//script[contains(., 'race.model')]") %>% html_text() 

ctx <- v8() 
ctx$eval(JS(js)) 

race <- ctx$get("race", simplifyVector=FALSE) 

str(race) ## output too large to paste here

如果他們改變的JavaScript的格式（這是一個自動化的過程，所以它不太可能，但你永遠不知道），那麼RSelenium方法會更好，只要他們不改變表結構的格式（再次，不太可能，但你永遠不知道）。

來源

2016-08-31 14:16:13 hrbrmstr

非常好的答案，特別是找出問題的解決方案，我無法弄清楚如何問昨天。頂尖的工作。 – user2728808

在R中使用`rvest`使用`read_html`時缺少元素

回答

相關問題