2014-09-30 71 views
2

我試圖從網站上刮取一些數據,但無法使POST正常工作,它的行爲好像我沒有給它輸入數據( 「應用筆記」)。POST URL通過Python請求編碼與基於行的文本數據

當我檢查POST數據時,除了實際的webform的POST被稱爲「URL Encoded」並列出每個表單輸入,而我的標記爲「基於行的文本數據」外,它看起來相同。

這裏是我的代碼:(應用筆記)和搜索(搜索)是最相關的作品,我需要

import requests 
import cookielib 


jar = cookielib.CookieJar() 
url = 'http://www.vivotek.com/faq/' 
headers = {'content-type': 'application/x-www-form-urlencoded'} 

post_data = {#'__EVENTTARGET':'', 
      #'__EVENTARGUMENT':'', 
      '__LASTFOCUS':'', 
      '__VIEWSTATE':'', 
      '__VIEWSTATEGENERATOR':'', 
      '__VIEWSTATEENCRYPTED':'', 
      '__PREVIOUSPAGE':'', 
      '__EVENTVALIDATION':'' 
      'ctl00$HeaderUc1$LanguageDDLUc1$ddlLanguage':'en', 
      'ctl00$ContentPlaceHolder1$CategoryDDLUc1$DropDownList1':'-1', 
      'ctl00$ContentPlaceHolder1$ProductDDLUc1$DropDownList1':'-1', 
      'ctl00$ContentPlaceHolder1$Content':'appnote', 
      'ctl00$ContentPlaceHolder1$Search':'Search' 
      } 
response = requests.get(url, cookies=jar) 

response = requests.post(url, cookies=jar, data=post_data, headers=headers) 

print(response.text) 

鏈接到什麼,我在Wireshark的談論圖片:

我也嘗試過使用wget獲得相同的結果。

回答

2

主要問題是您沒有設置重要的隱藏字段值,如__VIEWSTATE

要使用requests工作,需要解析頁面html並獲取相應的輸入值。

下面是使用BeautifulSoup HTML解析器和requests解決方案:

from bs4 import BeautifulSoup 
import requests 

url = 'http://www.vivotek.com/faq/' 
query = 'appnote' 

headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2062.124 Safari/537.36'} 

session = requests.Session() 
response = session.get(url, headers=headers) 

soup = BeautifulSoup(response.content) 

post_data = {'__EVENTTARGET':'', 
      '__EVENTARGUMENT':'', 
      '__LASTFOCUS':'', 
      '__VIEWSTATE': soup.find('input', id='__VIEWSTATE')['value'], 
      '__VIEWSTATEGENERATOR': soup.find('input', id='__VIEWSTATEGENERATOR')['value'], 
      '__VIEWSTATEENCRYPTED': '', 
      '__PREVIOUSPAGE': soup.find('input', id='__PREVIOUSPAGE')['value'], 
      '__EVENTVALIDATION': soup.find('input', id='__EVENTVALIDATION')['value'], 

      'ctl00$HeaderUc1$LanguageDDLUc1$ddlLanguage': 'en', 
      'ctl00$ContentPlaceHolder1$CategoryDDLUc1$DropDownList1': '-1', 
      'ctl00$ContentPlaceHolder1$ProductDDLUc1$DropDownList1': '-1', 
      'ctl00$ContentPlaceHolder1$Content': query, 
      'ctl00$ContentPlaceHolder1$Search': 'Search' 
      } 

response = session.post(url, data=post_data, headers=headers) 

soup = BeautifulSoup(response.content) 
for item in soup.select('a#ArticleShowLink'): 
    print item.text.strip() 

打印的appnote查詢具體結果:

How to troubleshoot when you can't watch video streaming? 
Recording performance benchmarking tool 
... 
+0

也表現的很出色,謝謝!所以,我猜想我缺少的主要是會話數據。這就說得通了! – Ganeshvara 2014-10-01 14:06:38