2017-09-13 296 views
0

試圖刮一些數據,但我不斷收到超時錯誤。我的網絡工作正常,我也更新到最新的R版本 - 在這一點上如何解決這個問題。隨時隨地嘗試使用任何網址。錯誤:端口443超時 - 刮數據

library(RCurl) 
library(XML) 

url = "https://inciweb.nwcg.gov/" 
content <- getURLContent(url) 
    Error in function (type, msg, asError = TRUE) : 
     Failed to connect to inciweb.nwcg.gov port 443: Timed out 

回答

1

您可能需要設定一個明確的超時較慢的連接:

library(httr) 
library(rvest) 

pg <- GET("https://inciweb.nwcg.gov/", timeout(60)) 

incidents <- html_table(content(pg))[[1]] 

str(incidents) 
## 'data.frame': 10 obs. of 7 variables: 
## $ Incident: chr "Highline Fire" "Cottonwood Fire" "Rattlesnake Point Fire" "Coolwater Complex" ... 
## $ Type : chr "Wildfire" "Wildfire" "Wildfire" "Wildfire" ... 
## $ Unit : chr "Payette National Forest" "Elko District Office" "Nez Perce - Clearwater National Forests" "Nez Perce - Clearwater National Forests" ... 
## $ State : chr "Idaho, USA" "Nevada, USA" "Idaho, USA" "Idaho, USA" ... 
## $ Status : chr "Active" "Active" "Active" "Active" ... 
## $ Acres : chr "83,630" "1,500" "4,843" "2,969" ... 
## $ Updated : chr "1 min. ago" "1 min. ago" "3 min. ago" "5 min. ago" ... 

臨時的解決方法

l <- charToRaw(paste0(readLines("https://inciweb.nwcg.gov/"), collapse="\n")) 

pg <- read_html(l) 

html_table(pg)[[1]] 
+0

嗯,嘗試與不同的超時(#),但不斷收到這:'pg < - GET(「https://inciweb.nwcg.gov/」,timeout(60)) curl :: curl_fetch_memory(url,handle = handle)的錯誤: 達到超時:10000毫秒後連接超時# – S31

+0

是的。我也嘗試過R中的其他網站,但遇到同樣的問題。通過瀏覽器訪問這些網站正常工作 – S31

+0

是的。使用Windows 7 – S31