2014-03-04 75 views
3

爲什麼request沒有下載對此網頁的迴應?用python使用請求庫模擬ajax請求lib

#!/usr/bin/python 

import requests 

headers={ 'content-type':'application/x-www-form-urlencoded; charset=UTF-8', 
    'Accept-Encoding': 'gzip, deflate', 
    'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:27.0) Gecko/20100101 Firefox/27.0', 
    'Referer' : 'http://sportsbeta.ladbrokes.com/football', 
    } 

payload={'N': '4294966750', 
    'facetCount_156%23327': '12', 
    'facetCount_157%23325': '8', 
    'form-trigger':'moreId', 
    'moreId':'156%23327', 
    'pageId':'p_football_home_page', 
    'pageType':'EventClass', 
    'type':'ajaxrequest' 
    } 

url='http://sportsbeta.ladbrokes.com/view/EventDetailPageComponentController' 

r = requests.post(url, data=payload, headers=headers) 

這些都是POST我在Firebug看到的參數,並且有響應接收回包含(足球聯賽)的列表,但是當我運行python腳本這樣我什麼也沒得到。

(您可以通過點擊See Alllink左側導航欄的比賽部分,看着螢火蟲的XHR看到在Firefox的請求;螢火蟲響應顯示HTML身體的預期。)

任何任何想法?我在處理有效載荷中的%符號時是否會造成任何麻煩?

編輯:使用會話

from requests import Request, Session 

#turn post string into dict: 
def parsePOSTstring(POSTstr): 
    paramList = POSTstr.split('&') 
    paramDict = dict([param.split('=') for param in paramList]) 
    return paramDict 

headers={'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:27.0) Gecko/20100101 Firefox/27.0', 
    'Referer' : 'http://sportsbeta.ladbrokes.com/football' 
    } 

#prep the data (POSTstr copied from Firebug raw source) 
POSTstr = "moreId=156%23327&facetCount_156%23327=12&event=&N=4294966750&pageType=EventClass& 
      pageId=p_football_home_page&type=ajaxrequest&eventIDNav=&removedSelectionNav=& 
      currentSelectedId=&form-trigger=moreId" 
payload = parsePOSTstring(POSTstr) 

#end url 
url='http://sportsbeta.ladbrokes.com/view/EventDetailPageComponentController' 

#start a session to manage cookies, and visit football page first so referer agrees 
s = Session() 
s.get('http://sportsbeta.ladbrokes.com/football') 
#now visit disired url with headers/data 
r = s.post(url, data=payload, headers=headers) 

#print output 
print r.text #this is empty 

嘗試工作捲曲

curl 'http://sportsbeta.ladbrokes.com/view/EventDetailPageComponentController' 
-H 'Cookie: JSESSIONID=DE93158F07E02DD3CC1CC32B1AA24A9E.ecomprodsw015; 
    geoCode=FRA; 
    FLAGS=en|en|uk|default|ODDS|0|GBP; 
    ECOM_BETA_SPORTS=1; 
    PLAYED=4%7C0%7C0%7C0%7C0%7C0%7C0' 
-H 'Referer: http://sportsbeta.ladbrokes.com/football' 
-H 'User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:27.0) 
    Gecko/20100101 Firefox/27.0' 
--data 'facetCount_157%23325=8&moreId=156%23327& 
     facetCount_156%23327=12&event=& 
     N=4294966750& 
     pageType=EventClass&pageId=p_football_home_page& 
     type=ajaxrequest&eventIDNav=& 
     removedSelectionNav=&currentSelectedId=& 
     form-trigger=moreId' --compressed 

然而,這種捲曲的作品。

+0

您首先需要訪問'http:// sportsbeta.ladbrokes.com/football'(而不是主頁)。 *然後*它似乎工作。除了'Referer'和'User-Agent'之外,你不需要任何其他頭文件。 – Blender

+0

@Blender我用你建議的最小標題更新了我的答案,並且還使用請求會話來管理cookie並首先訪問'football'主頁,因爲發出了ajax請求,但我仍然得到一個空的'r.text' ,這段代碼是否適合你? – fpghost

+1

如果您正確解碼百分比編碼字符(即在解碼之前將'%23'更改爲'#',或修復'parsePOSTstring'),它就可以工作。我沒有看到問題,因爲我一直使用字典。 – Blender

回答

9

這裏的最小工作的例子,我能想出:

from requests import Session 

session = Session() 

# HEAD requests ask for *just* the headers, which is all you need to grab the 
# session cookie 
session.head('http://sportsbeta.ladbrokes.com/football') 

response = session.post(
    url='http://sportsbeta.ladbrokes.com/view/EventDetailPageComponentController', 
    data={ 
     'N': '4294966750', 
     'form-trigger': 'moreId', 
     'moreId': '156#327', 
     'pageType': 'EventClass' 
    }, 
    headers={ 
     'Referer': 'http://sportsbeta.ladbrokes.com/football' 
    } 
) 

print response.text 

你只是不正確,所以#正在表示爲實際的POST數據%23(百分比編碼的POST數據進行解碼例如156%23327應該是156#327)。

+0

啊,非常感謝。畢竟是百分比編碼令人沮喪。因此,在cURL中,您可以將百分號編碼保留爲「156%23327」,但在python請求中,您需要實際符號而不是編碼符號。 – fpghost

+1

@fpghost:在cURL中,您發送POST *數據*。請求將字典序列化爲相同的格式。我認爲你也可以傳遞字符串而不是字典,請求會按原樣發送。 – Blender