2017-08-07 88 views
0

我試圖循環通過多個房地產經紀人網站,刮代理人姓名和手機號碼。網絡刮與Python - 通過多個網頁循環問題

我的代碼:

locations = ['woollahra', 'chinatown', 'bondibeach','doublebay'] 
for location in locations: 
    my_url = 'https://' + location + '.ljhooker.com.au/our-team' 

uClient = uReq(my_url) 
page_html = uClient.read() 
uClient.close() 

page_soup = soup(page_html, "html.parser") 

containers = page_soup.findAll("div", {"class":"team-details"}) 

for container in containers: 
    agent_name = container.findAll("div", {"class":"team-name"}) 
    name = agent_name[0].text 

    phone = container.findAll("span", {"class":"phone"}) 
    mobile = phone[0].text 

    print("name: " + name) 
    print("mobile: " + mobile) 

然而,當我運行我的腳本,它跳過前三個網頁(沃拉拉,唐人街,邦迪海灘),只有刮從列表(德寶灣)的最後一個網站的信息。我不確定爲什麼要這樣做,或者如何讓它在所有網頁中循環播放。

+0

確保添加您正在使用的模塊,請加'import'報表 –

+0

我想你失蹤的心理模型你的程序在做什麼。瀏覽你腦子裏的每一行。第一個for循環做什麼?最後的'my_url'狀態是什麼?你如何期望它重複所有'my_url'實例的代碼? –

回答

1

您應該在第一個循環中包含所有代碼,否則循環將只會更改變量my_url。因此,所有你需要做的就是縮進你的代碼的其餘部分:

locations = ['woollahra', 'chinatown', 'bondibeach','doublebay'] 
for location in locations: 
    my_url = 'https://' + location + '.ljhooker.com.au/our-team' 

    uClient = uReq(my_url) 
    page_html = uClient.read() 
    uClient.close() 

    page_soup = soup(page_html, "html.parser") 

    containers = page_soup.findAll("div", {"class":"team-details"}) 

    for container in containers: 
     agent_name = container.findAll("div", {"class":"team-name"}) 
     name = agent_name[0].text 

     phone = container.findAll("span", {"class":"phone"}) 
     mobile = phone[0].text 

     print("name: " + name) 
     print("mobile: " + mobile)