2017-07-18 103 views
0

爲了刮掉一些財務報表,我試圖獲得一個文檔交付協議號碼列表。httr POST隱藏字段

下面的url有指定公司所有文檔類別的鏈接。

u1 <- "http://siteempresas.bovespa.com.br/consbov/ExibeTodosDocumentosCVM.asp?CCVM=22446&CNPJ=09.414.761/0001-64&TipoDoc=C"

通過點擊DFP我重定向到包含協議號不同的頁面。問題是我無法在R中獲得相同的結果。

我試過httr :: POST沒有成功。

library(httr) 
page <- GET(u1, encoding = "ISO-8859-1") 
key <- cookies(page) 

pgpost <- POST(u1, 
       body = list(hdnCategoria = "IDI2", 
          action = "ExibeTodosDocumentosCVM.asp?CNPJ=09.414.761/0001-64&CCVM=22446&TipoDoc=C&QtLinks=10"), 
       set_cookies(ASPSESSIONIDQATQCCSC = key$value[1], 
          TS01871345 = key$value[2], 
          ASPSESSIONIDSQQTABSC = key$value[3], 
          ASPSESSIONIDSCDSBADC = key$value[4])) 

pgcont <- content(pgpost, "text", encoding = "ISO-8859-1") 
pgcont <- strsplit(pgcont, "\r")[[1]] 
pgcont <- gsub('[\n\t]', "", pgcont); pgcont 

pgcont表明我同樣的內容從u1

我使用rvest點擊鏈接

library(rvest) 
s <- html_session(u1) 
s %>% follow_link("DFP") 

也試過,但最終與此錯誤消息

[1] Navigating to javascript:fVisualizaDocumentos('C','IDI2') 
    Error in curl::curl_fetch_memory(url, handle = handle) : 
     Couldn't resolve host name 

任何如何解決這個問題的想法?提前致謝!
Here is a picture of the information I'm looking for

回答

0

我不認爲你需要的會話cookie:

library(httr) 
library(rvest) 
library(tidyverse) 

httr::POST(
    encode = "form", 
    url = "http://siteempresas.bovespa.com.br/consbov/ExibeTodosDocumentosCVM.asp", 
    query = list(
    CNPJ = "09.414.761/0001-64", 
    CCVM = "22446", 
    TipoDoc = "C", 
    QtLinks = "10" 
), 
    body = list(
    hdnCategoria = "IDI2", 
    hdnPagina = "", 
    FechaI = "", 
    FechaV = "" 
)) -> res 

content(res, encoding = "ISO-8859-1") %>% 
    html_nodes("table") 
## {xml_nodeset (21)} 
## [1] <table width="640" border="0" cellspacing="0" cellpadding="0" align ... 
## [2] <table width="95%" border="0" cellspacing="1" align="center" cellpa ... 
## [3] <table width="95%" border="0" cellspacing="1" align="center" cellpa ... 
## [4] <table width="95%" border="0" cellspacing="1" align="center" cellpa ... 
## [5] <table width="95%" border="0" cellspacing="1" align="center" cellpa ... 
## [6] <table width="95%" border="0" cellspacing="1" align="center" cellpa ... 
## [7] <table width="95%" border="0" cellspacing="1" align="center" cellpa ... 
## [8] <table width="95%" border="0" cellspacing="1" align="center" cellpa ... 
## [9] <table width="95%" border="0" cellspacing="1" align="center" cellpa ... 
## [10] <table width="95%" border="0" cellspacing="1" align="center" cellpa ... 
## [11] <table width="95%" border="0" cellspacing="1" align="center" cellpa ... 
## [12] <table width="95%" border="0" cellspacing="1" align="center" cellpa ... 
## [13] <table width="95%" border="0" cellspacing="1" align="center" cellpa ... 
## [14] <table width="95%" border="0" cellspacing="1" align="center" cellpa ... 
## [15] <table width="95%" border="0" cellspacing="1" align="center" cellpa ... 
## [16] <table width="95%" border="0" cellspacing="1" align="center" cellpa ... 
## [17] <table width="95%" border="0" cellspacing="1" align="center" cellpa ... 
## [18] <table width="95%" border="0" cellspacing="1" align="center" cellpa ... 
## [19] <table width="95%" border="0" cellspacing="1" align="center" cellpa ... 
## [20] <table width="95%" border="0" cellspacing="1" align="center" cellpa ... 
## ...