2016-09-28 55 views
1

如何刪除70頁的html數據?我正在看這question,但我堅持一般方法部分的功能。Web使用R系列廢棄多個系列頁面

#attempt 

library(purrr) 

url_base <-"https://secure.capitalbikeshare.com/profile/trips/QNURCMF2Q6" 

map_df(1:70, function(i) { 

cat(".") 

pg <- read_html(sprintf(url_base, i)) 

data.frame(startd=html_text(html_nodes(pg, ".ed-table__col_trip-start-date")), 
endd=html_text(html_nodes(pg,".ed-table__col_trip-end-date")), 
duration=html_text(html_nodes(pg, ".ed-table__col_trip-duration")) 
) 
}) -> table 



#attempt 2 (with just one data column) 

url_base <-"https://secure.capitalbikeshare.com/profile/trips/QNURCMF2Q6" 


map_df(1:70, function(i) { 

page %>% html_nodes(".ed-table__item_odd") %>% html_text() 

}) -> table 
+0

你的網址應該有一個參數,代表當前頁碼,然後你應該用'url_base'粘貼它來生成實際的網址。現在看來你正在嘗試訪問70次相同的URL –

回答

0

@ jso1226,我不知道發生了什麼事情在回答你的參考,所以我提供了一個例子非常相似的任務,你想要做什麼。

這是:轉到網頁收集信息,添加一個數據框,然後移動到下一頁。

我用創建來跟蹤我的答案,在這裏張貼到計算器驗證碼:

login<-"https://stackoverflow.com/users/login?ssrc=head&returnurl=http%3a%2f%2fstackoverflow.com%2f" 

library(rvest) 
pgsession<-html_session(login) 
pgform<-html_form(pgsession)[[2]] 
filled_form<-set_values(pgform, email="*****", password="*****") 
submit_form(pgsession, filled_form) 

#pre allocate the final results dataframe. 
results<-data.frame() 

for (i in 1:5) 
{ 
    url<-"http://stackoverflow.com/users/**********?tab=answers&sort=activity&page=" 
    url<-paste0(url, i) 
    page<-jump_to(pgsession, url) 

    #collect question votes and question title 
    summary<-html_nodes(page, "div .answer-summary") 
    question<-matrix(html_text(html_nodes(summary, "div"), trim=TRUE), ncol=2, byrow = TRUE) 

    #find date answered, hyperlink and whether it was accepted 
    dateans<-html_node(summary, "span") %>% html_attr("title") 
    hyperlink<-html_node(summary, "div a") %>% html_attr("href") 
    accepted<-html_node(summary, "div") %>% html_attr("class") 

    #create temp results then bind to final results 
    rtemp<-cbind(question, dateans, accepted, hyperlink) 
    results<-rbind(results, rtemp) 
} 

#Dataframe Clean-up 
names(results)<-c("Votes", "Answer", "Date", "Accepted", "HyperLink") 
results$Votes<-as.integer(as.character(results$Votes)) 
results$Accepted<-ifelse(results$Accepted=="answer-votes default", 0, 1) 

在這種情況下,循環僅限於5頁,這需要改變,以適應您的應用程序。我用******替換了用戶特定的值,希望這會爲您提供一些指導問題。