儘管頁碼增加，但Python請求模塊獲得了相同的結果

URL中更改的唯一內容是頁碼，該頁碼在每個請求後遞增。儘管頁碼增加，但Python請求模塊獲得了相同的結果

除了硒或相關工具，我不確定可以使用什麼方法遍歷頁面。我的直覺是，可能有一些標題/查詢組合直接獲取數據，但我不知道在哪裏找到它。

url = 'http://therunningbug.co.uk/events/find-races.aspx?EventName=&AddressRegion=&AddressCounty=&Date=&Surface=#Sort=Date&page=' 

page = 1 

while True: 

    pageData = BeautifulSoup(requests.get(url + str(page)).content) 

    articles = pageData.find('div', {'class':"items-content"}) 

    for a in articles.find_all('article'): 
     name = a.find('span', {'itemprop':"name"}).text 
     d, t = a.find('time').get('datetime').split('T') 

     timeData = t[:-3] 

     dateData = d.split('-') 
     date = (dateData[1] + '/' + dateData[2] + '/' + dateData[0][2:]).strip() 
     description = a.find('p', {'itemprop':"description"}).text.strip() 
     webLink = 'http://therunningbug.co.uk' + a.find('a', {'itemprop':"url"}).get('href') 
     category = a.find('span', {'class':"surface"}).text 
     location = a.find('span', {'class':"region"}).text + ', ' + a.find('span', {'class':"county"}).text 

     print name, ' -- name' 
     print date, ', ', timeData, ' -- date, time' 
     print description, ' -- description' 
     print webLink, ' -- website link' 
     print category, ' -- category' 
     print location, ' -- location\n' 

    page += 1

來源

2016-11-05 Phillip

你一定是網絡服務器實際使用'page'查詢參數？如果只有該值發生變化，內容纔會發生變化，那麼您必須假定它沒有變化。 –

您是否知道如何找到用於每個頁面上列表的查詢？ – Phillip

此外，它似乎至少間接地使用頁面參數，因爲可以在瀏覽器中輸入URL以獲取該頁面的結果 – Phillip

問題可能是URL編碼。您可以使用urlencode：

url = 'http://therunningbug.co.uk/events/find-races.aspx' 
payload = {'page': page} 
pageData = BeautifulSoup(requests.get(url, params = payload).content)

這也適用於URI中沒有複雜字符以進行真正的URL編碼。

url = 'http://therunningbug.co.uk/events/find-races.aspx' 
pageData = BeautifulSoup(requests.get(url + '?page=' + str(page)).content)

請參閱URL編碼的請求文檔。

完整代碼：

#!/usr/bin/env python 

import requests 
from bs4 import BeautifulSoup 

page = 1 
while True: 

    url = 'http://therunningbug.co.uk/events/find-races.aspx' 
    payload = {'page': page} 
    pageData = BeautifulSoup(requests.get(url, params = payload).content) 

    articles = pageData.find('div', {'class':"items-content"}) 

    for a in articles.find_all('article'): 
     name = a.find('span', {'itemprop':"name"}).text 
     d, t = a.find('time').get('datetime').split('T') 

     timeData = t[:-3] 

     dateData = d.split('-') 
     date = (dateData[1] + '/' + dateData[2] + '/' + dateData[0][2:]).strip() 
     description = a.find('p', {'itemprop':"description"}).text.strip() 
     webLink = 'http://therunningbug.co.uk' + a.find('a', {'itemprop':"url"}).get('href') 
     category = a.find('span', {'class':"surface"}).text 
     location = a.find('span', {'class':"region"}).text + ', ' + a.find('span', {'class':"county"}).text 

     print name, ' -- name' 
     print date, ', ', timeData, ' -- date, time' 
     print description, ' -- description' 
     print webLink, ' -- website link' 
     print category, ' -- category' 
     print location, ' -- location\n' 

    page += 1

來源

2016-11-05 21:24:45

儘管頁碼增加，但Python請求模塊獲得了相同的結果

回答

相關問題