我開始使用R.我設計了一個小小的R代碼裏面做這樣一個測試網頁加載時間的想法玩:使用Rcurl計算頁面元素的加載時間? (R)
page.load.time <- function(theURL, N = 10, wait_time = 0.05)
{
require(RCurl)
require(XML)
TIME <- numeric(N)
for(i in seq_len(N))
{
Sys.sleep(wait_time)
TIME[i] <- system.time(webpage <- getURL(theURL, header=FALSE,
verbose=TRUE))[3]
}
return(TIME)
}
,並歡迎您的幫助在幾個方面:
- 是否可以這樣做,但也知道頁面的哪些部分需要加載哪些部分? (有點像雅虎YSlow)
- 我有時碰到下面的錯誤 -
錯誤curlPerform(捲曲=捲曲, .opts =選擇採用,.encoding = .encoding): 故障時接收從 等時間數據停在:0.03 0 43.72
是什麼原因造成這一點,如何捕捉這樣的錯誤,並放棄他們有什麼建議?
你能想出改善上述功能的方法嗎?
更新:我redid函數。它現在是痛苦的慢...
one.page.load.time <- function(theURL, HTML = T, JavaScript = T, Images = T, CSS = T)
{
require(RCurl)
require(XML)
TIME <- NULL
if(HTML) TIME["HTML"] <- system.time(doc <- htmlParse(theURL))[3]
if(JavaScript) {
theJS <- xpathSApply(doc, "//script/@src") # find all JavaScript files
TIME["JavaScript"] <- system.time(getBinaryURL(theJS))[3]
} else (TIME["JavaScript"] <- NA)
if(Images) {
theIMG <- xpathSApply(doc, "//img/@src") # find all image files
TIME["Images"] <- system.time(getBinaryURL(theIMG))[3]
} else (TIME["Images"] <- NA)
if(CSS) {
theCSS <- xpathSApply(doc, "//link/@href") # find all "link" types
ss_CSS <- str_detect(tolower(theCSS), ".css") # find the CSS in them
theCSS <- theCSS[ss_CSS]
TIME["CSS"] <- system.time(getBinaryURL(theCSS))[3]
} else (TIME["CSS"] <- NA)
return(TIME)
}
page.load.time <- function(theURL, N = 3, wait_time = 0.05,...)
{
require(RCurl)
require(XML)
TIME <- vector(length = N, "list")
for(i in seq_len(N))
{
Sys.sleep(wait_time)
TIME[[i]] <- one.page.load.time(theURL,...)
}
require(plyr)
TIME <- data.frame(URL = theURL, ldply(TIME, function(x) {x}))
return(TIME)
}
a <- page.load.time("http://www.r-bloggers.com/", 2)
a
謝謝,有關如何做到這一點的任何建議將有所幫助:) – 2011-04-25 09:28:28
取決於你想要它有多複雜。一些網頁將使用Ajax抓取東西,這意味着運行Javascript ... – Spacedman 2011-04-25 09:38:30
mmm ...你知道是否有可能使用R運行Javascript? – 2011-04-25 09:53:48