2017-07-17 198 views
1

我需要將嵌套字典作爲參數傳遞給get請求。防止scrapy從url中刪除方括號和大括號

下面是它應該如何看待工作

query = {%22channel%22:%22rent%22,%22page%22:2,%22pageSize%22:12,%22filters%22:{%22agencyIds%22:[%22CBPHMG%22]}} 

以下是我在Scrapy日誌中獲取:

%7B%22pageSize%22:%20300,%20%22page%22:%208,%20%22channel%22:%20%22rent%22,%20%22filters%22:%20%7B%22agencyIds%22:%20%22VDTUED%22%7D%7D 

問題是與廣場和大括號。

我現在所做的只是json.dumps(dict)並將其追加到url。我也嘗試使用反斜槓來防止改變符號。沒有avile。

q = {"channel":"sold","page":1,"pageSize":300,"filters":{"agencyIds":["PRDNEW"]}} 
query = json.dumps(q) 
query = query.replace('"', '\\"') 
url = url + query 

此外,下面的代碼可以很好地處理python3請求。

import requests 

url = "https://services.realestate.com.au/services/listings/search" 

querystring = {"query":"{\"channel\":\"buy\",\"page\":2,\"pageSize\":12,\"filters\":{\"agencyIds\":[\"CBPHMG\"]}}"} 

headers = {'cache-control': 'no-cache'} 

response = requests.request("GET", url, headers=headers, params=querystring) 

print(response.text) 

回答

3

您可以使用w3lib.url.add_or_replace_parameterquery參數附加到URL。它會以同樣的方式進行URL編碼爲蟒蛇,請求:

$ scrapy shell 
2017-07-18 11:03:28 [scrapy.utils.log] INFO: Scrapy 1.4.0 started (bot: scrapybot) 
(...) 
>>> url = "https://services.realestate.com.au/services/listings/search" 
>>> querystring = {"query":"{\"channel\":\"buy\",\"page\":2,\"pageSize\":12,\"filters\":{\"agencyIds\":[\"CBPHMG\"]}}"} 

這是相同的輸入數據的python-requests例子。

使用add_or_replace_parameter與參數的名稱和它的值(注:Scrapy已經依賴於w3lib):

>>> from w3lib.url import add_or_replace_parameter 
>>> add_or_replace_parameter(url, 'query', querystring['query']) 
'https://services.realestate.com.au/services/listings/search?query=%7B%22channel%22%3A%22buy%22%2C%22page%22%3A2%2C%22pageSize%22%3A12%2C%22filters%22%3A%7B%22agencyIds%22%3A%5B%22CBPHMG%22%5D%7D%7D' 

在這裏,Scrapy殼,獲取新的URL會得到一個JSON響應返回,如預期的那樣:

>>> new_url = add_or_replace_parameter(url, 'query', querystring['query']) 
>>> fetch(new_url) 
2017-07-18 11:04:45 [scrapy.core.engine] INFO: Spider opened 
2017-07-18 11:04:46 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://services.realestate.com.au/services/listings/search?query=%7B%22channel%22%3A%22buy%22%2C%22page%22%3A2%2C%22pageSize%22%3A12%2C%22filters%22%3A%7B%22agencyIds%22%3A%5B%22CBPHMG%22%5D%7D%7D> (referer: None) 


>>> import json 
>>> data = json.loads(response.text) 
>>> data.keys() 
dict_keys(['prettyUrl', 'totalResultsCount', 'resolvedQuery', '_links', 'tieredResults', 'channel']) 

>>> from pprint import pprint 
>>> pprint(data) 
{'_links': {'adCall': {'href': 'https://sasinator.realestate.com.au/rea/hserver/site=rea/area=buy.resultslist/proptype=villa/constructionStatus=established/sub=marsden/state=qld/pcode=4132/region=logan/price=200k_300k/platform={platform}/version={version}/pos={position}/size={size}/viewid={viewId}/random={random}', 
         'templated': True}, 
      'canonical': {'href': 'http://www.realestate.com.au/buy/by-cbphmg/list-2'}, 
      'exclusiveShowcaseUrl': {'href': 'https://services.realestate.com.au/services/listings/exclusiveShowcase?query=%7B%22propertyTypes%22:[],%22atlasIds%22:[],%22channel%22:%22buy%22%7D'}, 
      'neighbourhoodsUrl': {'href': 'http://www.realestate.com.au/neighbourhoods?state=qld'}, 
      'next': {}, 
      'ofi': {'href': 'https://services.realestate.com.au/services/listings/ofi/{date}/daytotals?query=%7B%22channel%22:%22buy%22,%22pageSize%22:%2212%22,%22page%22:%222%22,%22filters%22:%7B%22agencyIds%22:%5B%22CBPHMG%22%5D%7D%7D', 
        'templated': True}, 
      'prettyUrl': {'href': '/buy/by-cbphmg/list-2'}, 
      'saveSearchUrl': {'href': 'https://www.realestate.com.au/saved-searches/#/save?search=%7B%22channel%22:%22buy%22,%22pageSize%22:%2212%22,%22page%22:%222%22,%22filters%22:%7B%22agencyIds%22:%5B%22CBPHMG%22%5D%7D%7D'}, 
      'self': {'href': 'https://services.realestate.com.au/services/listings/search?query=%7B%22channel%22:%22buy%22,%22pageSize%22:%2212%22,%22page%22:%222%22,%22filters%22:%7B%22agencyIds%22:%5B%22CBPHMG%22%5D%7D%7D'}}, 
'channel': 'buy', 
'prettyUrl': '/buy/by-cbphmg/list-2', 
'resolvedQuery': {'channel': 'buy', 
        'filters': {'agencyIds': ['CBPHMG']}, 
        'page': '2', 
        'pageSize': '12'}, 
'tieredResults': [{'count': 11, 
        'results': [{...}], 
        'tier': 1}], 
'totalResultsCount': 23}