2016-04-15 81 views
5

我試圖使用rvest軟件包從CABI invasive species compendium中提取入侵植物物種位置的數據。從html表格中刮取數據

看了一些教程後,我發現我應該可以很容易地從表中抓取數據。但是,我一直遇到困難。

比方說,我想要物種Brassica tournefortii的位置數據。我應該能夠使用這個代碼,它使用技術outlined here來獲取物種已被記錄的位置的細節。

library(rvest) 
isc<-read_html("http://www.cabi.org/isc/datasheet/50069") 
isc %>% 
html_node("#toDistributionTable td:nth-child(1)") %>% 
html_text() 

然而,這段代碼運行出現錯誤

Error: No matches 

我完全新的webscraping。我在做可怕的錯誤嗎?

回答

8

首先,我希望我能爲你提供更多的幫助。最後一個不是$ SPORTSBALL或$ MONEY相關的問題! :-)

該網站是邪惡的。它使用需要處理的嵌入式命名空間,這也意味着使用xml2程序包:

library(rvest) 
library(xml2) 

isc <- read_html("http://www.cabi.org/isc/datasheet/50069") 

ns <- xml_ns(isc) 

xml_text(xml_find_all(isc, xpath="//div[@id='toDistributionTable']/table/tbody/tr/td[1]", ns)) 

## [1] "ASIA"       "Azerbaijan"      
## [3] "Bhutan"       "China"       
## [5] "-Tibet"       "India"       
## [7] "-Delhi"       "-Indian Punjab"     
## [9] "-Rajasthan"      "-Uttar Pradesh"     
## [11] "Iran"       "Iraq"       
## [13] "Israel"       "Jordan"       
## [15] "Kuwait"       "Lebanon"      
## [17] "Oman"       "Pakistan"      
## [19] "Qatar"       "Saudi Arabia"     
## [21] "Syria"       "Turkey"       
## [23] "Turkmenistan"     "United Arab Emirates"   
## [25] "Uzbekistan"      "Yemen"       
## [27] "AFRICA"       "Algeria"      
## [29] "Egypt"       "Libya"       
## [31] "Morocco"      "South Africa"     
## [33] "Tunisia"      "NORTH AMERICA"     
## [35] "Mexico"       "USA"       
## [37] "-Arizona"      "-California"     
## [39] "-Nevada"      "-New Mexico"     
## [41] "-Texas"       "-Utah"       
## [43] "SOUTH AMERICA"     "Chile"       
## [45] "EUROPE"       "Belgium"      
## [47] "Cyprus"       "Denmark"      
## [49] "France"       "Greece"       
## [51] "Ireland"      "Italy"       
## [53] "Spain"       "Sweden"       
## [55] "UK"        "-England and Wales"    
## [57] "-Scotland"      "OCEANIA"      
## [59] "Australia"      "-Australian Northern Territory" 
## [61] "-New South Wales"    "-Queensland"     
## [63] "-South Australia"    "-Tasmania"      
## [65] "-Victoria"      "-Western Australia"    
## [67] "New Zealand" 
+0

太棒了,謝謝!這應該有助於我從該網站獲取數據的良好開端。你如何獲得信息進入xml_find_all函數的xpath部分? –

+1

右鍵單擊並在該表上選擇檢查元素後,我將其從開發人員工具中顯示的路徑映射。我可能可以用CSS重新做,但在某些情況下,瞭解一點XPath可以提供幫助。 – hrbrmstr