2016-11-05 80 views
0

所以我試圖訪問zillow URL。當通過瀏覽器訪問它給出的不同於我通過代碼看到的。下面的細節。Python請求獲取的數據與我在瀏覽器上看到的數據不同

捲曲

curl 'http://www.zillow.com/homes/KY_rb/' -H 'Host: www.zillow.com' -H 'User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:49.0) Gecko/20100101 Firefox/49.0' -H 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8' -H 'Accept-Language: en-US,en;q=0.5' --compressed -H 'Referer: http://www.zillow.com/homes/fsbo/featured_sort/47.368594,-68.686523,28.110749,-124.936523_rect/3_zm/' -H 'Cookie: JSESSIONID=D9BF4E280B16431893C3A11A8FC3F825; abtest=3|DO8RElLJuj2felZqqw; zguid=23|%24b42a26dc-8387-4086-b000-cc49ddfbc450; search=6|1480915840720%7Crect%3D47.368594%252C-68.686523%252C28.110749%252C-124.936523%26zm%3D3%26disp%3Dmap%26mdm%3Dauto%26p%3D1%26sort%3Dfeatured%26z%3D1%26lt%3Dfsbo%26fs%3D1%26fr%3D0%26mmm%3D1%26rs%3D0%26ah%3D0%26singlestory%3D0%09%01%09%09%09%092%090%09US_%09; F5P=3005270026.0.0000; _ga=GA1.2.1136269898.1478324471; _gat=1; __gads=ID=3f2f3e2d6e19b149:T=1478323799:S=ALNI_Mava6ZGjT_MrRhAVG7ndewcDCN60A; ipe_s=fbc57b01-3937-f803-5da1-5c4887cc949d; _bizo_bzid=aa621351-3627-408d-8838-440c1bd3f163; _bizo_cksm=EE838E07FF3AF15E; ipe.29115.pageViewedCount=1; _bizo_np_stats=14%3D1028%2C' -H 'Connection: keep-alive' -H 'Upgrade-Insecure-Requests: 1' 

捲曲給出正確的結果。

Fetch.py​​

import requests 
from bs4 import BeautifulSoup 
from time import sleep 
import xmltodict 

state = 'KY' 
url = 'http://www.zillow.com/homes/' + state + '_rb/' 
property_urls = [] 
headers = { 
    'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.143 Safari/537.36', 
    'upgrade-insecure-requests': 1, 
    'accept-language': 'en-US,en;q=0.8', 
    'Connection': 'keep-alive', 
    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8' 
} 

try: 
    session = requests.session() 
    r = session.get(url, headers=headers, timeout=5) 
    sleep(2) 
    html = html = r.text 
    soup = BeautifulSoup(html, 'lxml') 
    print(html) 
except requests.ConnectionError as e: 

    print("OOPS!! Connection Error. Make sure you are connected to Internet. Technical Details given below.\n") 

    print(str(e)) 

except requests.Timeout as e: 

    print("OOPS!! Timeout Error") 

    print(str(e)) 

except requests.RequestException as e: 

    print("OOPS!! General Error") 

    print(str(e)) 

except KeyboardInterrupt: 

    print("Someone closed the program") 

finally: 
    print("Total Properties = " + str(len(property_urls))) 
    try: 
     # file to store state based URLs 
     state_file = open(state + '_file.txt', 'a+') 
     state_file.write("\n".join(property_urls)) 
     state_file.close() 
    except Exception as ex: 
     print("Unable to store records in CSV file. Techncical details below.\n") 
     print(str(e)) 

回答

0

不知道你的different data的意思(可能意味着什麼,稍有不同,完全不同的,等等)。你的捲曲使用--compressed,實際上意味着請求標題Accept-Encoding: deflate, gzip。嘗試從你的python代碼中添加這個頭文件。

+0

不同的我的意思是缺少的東西和Zillow沒有給出實際結果。我也厭倦了疲倦的編碼。 – Volatil3

相關問題