2016-12-24 78 views
1

我有關於網頁瀏覽的另一個nube問題。我正在使用Rvest試圖從警察報告網站上獲取一些數據。我一直在環顧四周,但我似乎無法找到繞過「我同意」按鈕的「接受條款和條件」的方法。如何提交「我同意」才能訪問該網站?接受條款和條件在JS網頁使用R刮(Rvest)

網站= http://www.wspdp2c.org/Summary_Disclaimer.aspx

require(httr) 
require(XML) 
library(RCurl) 
library(rvest) 

wspd.url<- "http://www.wspdp2c.org/Summary_Disclaimer.aspx" 

wspd.session<-html_session(wspd.url) 
wspd.form<-html_form(read_html(wspd.session)) 
wspd.form 

[R輸出繼電器:

> wspd.form 
[[1]] 
<form> 'Form1' (POST ./Summary_Disclaimer.aspx) 
    <input hidden> '_popupBlockerExists': true 
    <input hidden> '__VIEWSTATE': /wEPDwUKLTUwMDM5Nzk4OA9.... 
    <input hidden> '__VIEWSTATEGENERATOR': 27903AD3 
    <input hidden> '__EVENTVALIDATION': /wEdAAky7XCY2Cjbe0DHcJ.... 
    <select> 'ctl00$MasterPage$DDLSiteMap1$ddlQuickLinks' [1/7] 
    <input submit> 'ctl00$MasterPage$mainContent$CenterColumnContent$btnContinue': I Agree 
+0

這是一個SharePoint驅動的網站。只需使用RSelenium或seleniumPipes即可。 – hrbrmstr

回答

1

你需要找出如何讓你的系統上運行的硒和如何獲得remoteDr(...)電話去。在此之後,這應該幫助你開始:

library(seleniumPipes) 
library(rvest) 
library(dplyr) 
library(stringi) 
library(purrr) 

remDr <- remoteDr(...) 

remDr %>% go("http://www.wspdp2c.org/Summary_Disclaimer.aspx") 

submit <- remDr %>% findElement("xpath", ".//input[@type='submit']") 
submit %>% elementClick() 

from_date <- remDr %>% findElement("xpath", ".//input[@name='MasterPage$mainContent$txtDateFrom2']") 
from_date %>% elementClear() 
from_date %>% elementSendKeys("12/22/2016") 
to_date %>% elementSendKeys("12/23/2016", selKeys$escape) # esc clears the popup calednar 

to_date <- remDr %>% findElement("xpath", ".//input[@name='MasterPage$mainContent$txtDateTo2']") 
to_date %>% elementClear() 
to_date %>% elementSendKeys("12/23/2016", selKeys$escape) 

search <- remDr %>% findElement("class name", "ui-icon-search") 
search %>% elementClick() 

remDr %>% getPageSource() -> pg 
html_nodes(pg, "table.DataGridText") -> tab 

html_nodes(tab, xpath=".//td[2]")[1:9] %>% 
    html_text() %>% 
    as.POSIXct(format="%m/%d/%Y %H:%M") -> occurred 

html_nodes(tab, xpath=".//td[3]")[1:9] %>% 
    html_text() -> incident_or_arrest 

html_nodes(tab, xpath=".//td[4]")[1:9] %>% 
    html_text() %>% 
    stri_trim_both() -> case_or_arrestee 

stri_match_all_regex(case_or_arrestee, 
        paste0(c("Case #: ([[:digit:]]+)", 
         "Primary Offense: ([[:print:]]+)", 
         "Arrestee: ([[:print:]]+)", 
         "Charge: ([[:print:]]+)"), collapse="|")) %>% 
    map(~apply(.[,2:5], 1, discard, is.na)) %>% 
    map_df(function(x) { 
    x <- as.list(x) 
    if (stri_detect_regex(x[[1]], "[[:alpha:]]")) { 
     setNames(x, c("arrestee", "charge")) 
    } else { 
     setNames(x, c("case_number", "primary_offense")) 
    } 
    }) -> case_or_arrestee 

html_nodes(tab, xpath=".//td[5]")[1:9] %>% 
    html_text() -> location 

data_frame(occurred, incident_or_arrest, location) %>% 
    bind_cols(case_or_arrestee) %>% 
    glimpse() 
## Observations: 9 
## Variables: 7 
## $ occurred   <dttm> 2016-12-22 00:00:00, 2016-12-22 00:00:00, 2016-12-22 00:0... 
## $ incident_or_arrest <chr> "Incident", "Incident", "Arrest", "Incident", "Incident", ... 
## $ location   <chr> "2600-BLK TODDLER PLACE DR", "300-BLK ALSPAUGH DR", ... 
## $ case_number  <chr> "1667276", "1667273", NA, "1667249", "1667248", NA, NA, "1... 
## $ primary_offense <chr> "BREAKING & ENTERING WITH FORCE", "MALICIOUS INJURY TO PRO... 
## $ arrestee   <chr> NA, NA, "THOMAS, KERRY MARTIN", NA, NA, "LOZANO, MIGUEL AR... 
## $ charge    <chr> NA, NA, "PANHANDLING W/ NO PRIVLEDGE LICENSE", NA, NA, "AN... 
+0

這是完美的。謝謝! – MDEWITT

+0

完整測試後,勾選答案複選標記可幫助其他人表示它是一種解決方案 – hrbrmstr