[R刮的XPath

我是新來刮的第一個任務，我決定湊這個網頁：https://finstat.sk/databaza-financnych-udajov?EmployeeExact=False&RpvsInsert=False&Sort=assets&PerPage=20 [R刮的XPath

下在頁面上有一個包含數字信息，我想湊列表。你能幫我解決嗎？我試過這個代碼。

library('rvest') 


url <- 'https://finstat.sk/databaza-financnych-udajov?EmployeeExact=False&RpvsInsert=False&Sort=assets&PerPage=20' 

webpage <- read_html(url) 

tabulka <- html_nodes(webpage, xpath='/html/body/div[5]/div/div[3]/div[4]/div[2]/div/div/div[3]/table/tbody/tr[1]') %>% 
    html_table() %>% 

head(tabulka)

我運行此之後，我得到的錯誤：長度（N）== 1L是不是真

Output needed

來源

2017-10-20 Tomas

大，你有一個代碼例。你能不能展示你作爲一個輸出所期待的，以及你得到的是什麼？ – Bobby

我很抱歉沒有發佈錯誤。這是我得到的：長度（n）== 1L不是真我編輯帖子，並附上我想要的屏幕，至少50頁的這些信息從提到的網站。 – Tomas

編輯問題以包含附加信息，例如關於錯誤等 – QHarr

也許這：

library(rvest) 
library(tidyverse) 

scrape_data <- function(x) { 
    page <- read_html(sprintf("https://finstat.sk/databaza-financnych-udajov?EmployeeExact=False&RpvsInsert=False&Sort=assets&Page=%s", x)) 
    first_two_cols <- lapply(c("td.data-table-column-pinned", "td.hidden-xs"), function(x) page %>% html_nodes(x) %>% html_text(trim = T)) %>% data.frame() 
    remaining_cols <- lapply(3:7, function(x) page %>% html_nodes(sprintf(".nowrap:nth-child(%s)",x)) %>% html_text(trim = T)) %>% data.frame() 
    cbind(first_two_cols, remaining_cols) %>% set_names(paste0("var", 1:7)) 
} 

#The following scrapes 5 pages, but the number can be adjusted: 
df <- map_df(1:5, scrape_data)

來源

2017-10-20 19:05:25 udden2903

謝謝你。我會盡力進一步調整。我感謝您的幫助。 – Tomas

回答

相關問題