2016-04-14 130 views
1

我有以下用R編寫的代碼,其中我想從this particular webpage獲取一些名稱。R readHTMLTable函數不工作

library(RCurl) 
library(XML) 
x <- getURL("http://www.encyclopedia-titanica.org/titanic-passengers-crew-lived/country-17/england.html") 
x_2 <- htmlParse(x) 
x_3 <- readHTMLTable(x_2) 

但是,每當我看X_3的內容,我得到以下...

x_3 
named list() 

它好像在readHTMLTable功能無法獲得表。任何人都可以幫助我從這個網頁獲取乘客的姓名,而無需複製和粘貼?非常感激。

+0

您需要首先提取表格元素,然後才能使用readHTMLTable()。使用XPath - 類似於'tableVar < - xpathApply(x_2,「// table [@ id ='manifest']」)''。然後,你應該可以做'x_3 < - readHTMLTable(tableVar)' – WillardSolutions

+0

(我有防火牆問題的ATM,所以我無法測試這個,順便說一下......) – WillardSolutions

回答

0
library(rvest) 
library(dplyr) 

base <- "http://www.encyclopedia-titanica.org/titanic-passengers-crew-lived/country-17/england.html" 

# I use the older rvest package...`html` might be `read_html` now.Link to git repo below: 
# https://github.com/hadley/rvest/blob/7d65d84e013b1bb3827ae0a2e05ddaed4875c112/R/parse.R 
data_df <- (html(base) %>% html_table)[[1]] 

knitr::kable(summary(data_df)) 

    | | Name   | Age   | Class/Dept  | Ticket  | Joined  | Job   |Boat [Body]  |    | 
    |:--|:----------------|:----------------|:----------------|:----------------|:----------------|:----------------|:----------------|:------------| 
    | |Length:1190  |Length:1190  |Length:1190  |Length:1190  |Length:1190  |Length:1190  |Length:1190  |Mode:logical | 
    | |Class :character |Class :character |Class :character |Class :character |Class :character |Class :character |Class :character |NA's:1190 | 
    | |Mode :character |Mode :character |Mode :character |Mode :character |Mode :character |Mode :character |Mode :character |NA   | 
+0

非常感謝這個解決方案。很好地工作! – ACE

+0

很高興聽到@ACE –