網絡刮與Python - 通過多個網頁循環問題

我試圖循環通過多個房地產經紀人網站，刮代理人姓名和手機號碼。網絡刮與Python - 通過多個網頁循環問題

我的代碼：

locations = ['woollahra', 'chinatown', 'bondibeach','doublebay'] 
for location in locations: 
    my_url = 'https://' + location + '.ljhooker.com.au/our-team' 

uClient = uReq(my_url) 
page_html = uClient.read() 
uClient.close() 

page_soup = soup(page_html, "html.parser") 

containers = page_soup.findAll("div", {"class":"team-details"}) 

for container in containers: 
    agent_name = container.findAll("div", {"class":"team-name"}) 
    name = agent_name[0].text 

    phone = container.findAll("span", {"class":"phone"}) 
    mobile = phone[0].text 

    print("name: " + name) 
    print("mobile: " + mobile)

然而，當我運行我的腳本，它跳過前三個網頁（沃拉拉，唐人街，邦迪海灘），只有刮從列表（德寶灣）的最後一個網站的信息。我不確定爲什麼要這樣做，或者如何讓它在所有網頁中循環播放。

來源

2017-08-07 Oren

確保添加您正在使用的模塊，請加'import'報表 –

我想你失蹤的心理模型你的程序在做什麼。瀏覽你腦子裏的每一行。第一個for循環做什麼？最後的'my_url'狀態是什麼？你如何期望它重複所有'my_url'實例的代碼？ –

您應該在第一個循環中包含所有代碼，否則循環將只會更改變量my_url。因此，所有你需要做的就是縮進你的代碼的其餘部分：

locations = ['woollahra', 'chinatown', 'bondibeach','doublebay'] 
for location in locations: 
    my_url = 'https://' + location + '.ljhooker.com.au/our-team' 

    uClient = uReq(my_url) 
    page_html = uClient.read() 
    uClient.close() 

    page_soup = soup(page_html, "html.parser") 

    containers = page_soup.findAll("div", {"class":"team-details"}) 

    for container in containers: 
     agent_name = container.findAll("div", {"class":"team-name"}) 
     name = agent_name[0].text 

     phone = container.findAll("span", {"class":"phone"}) 
     mobile = phone[0].text 

     print("name: " + name) 
     print("mobile: " + mobile)

來源

2017-08-07 01:16:25

網絡刮與Python - 通過多個網頁循環問題

回答

相關問題