2016-12-15 68 views
0

我想按城市(代碼中列出的5個城市)搜索airbnb的房源信息,並希望收集諸如以下信息:價格,房源鏈接,房間類型,客人數量等。我怎樣才能檢索airbnb使用美麗的價格刮擦的價格?

我能夠獲得鏈接,但我無法獲得價格。

希望對此有幫助。

謝謝!

from bs4 import BeautifulSoup 
import requests 
import csv 
from urllib.parse import urljoin # For joining next page url with base url 
from datetime import datetime # For inserting the current date and time 

start_url_nyc = "https://www.airbnb.com/s/New-York--NY--United-States" 
start_url_mia = "https://www.airbnb.com/s/Miami--FL--United-States" 
start_url_la = "https://www.airbnb.com/s/Los_Angeles--CA--United-States" 
start_url_sf = "https://www.airbnb.com/s/San_Francisco--CA--United-States" 
start_url_orl = "https://www.airbnb.com/s/Orlando--FL--United-States" 


def scrape_airbnb(url): 
    # Set up the URL Request 
    headers = {'User-Agent': 'Mozilla/5.0'} 
    response = requests.get(url, headers=headers) 
    soup = BeautifulSoup(response.text, "html.parser") 

    # Iterate over search results 
    for search_result in soup.find_all('div', 'infoContainer_tfq3vd'): 
      # Parse the name and price and record the time 
     link_end = search_result.find('a').get('href') 
     link = "https://www.airbnb.com" + link_end 
     price = search_result.find('span', 'data-pricerate').find('data-reactid').get(int) 
    return (price) 

print(scrape_airbnb(start_url_orl)) 

回答

0

這是html代碼:

<span data-pricerate="true" data-reactid=".91165im9kw.0.2.0.3.2.1.0.$0.$grid_0.$0/=1$=01$16085565.$=1$16085565.0.2.0.1.0.0.0.1:1">552</span> 

這是你的代碼

price = search_result.find('span', 'data-pricerate').find('data-reactid').get(int) 

第一:

某些屬性,就像HTML數據 - *屬性5,具有不能用作關鍵字參數名稱的名稱:

data_soup = BeautifulSoup('<div data-foo="value">foo!</div>') 
data_soup.find_all(data-foo="value") 
# SyntaxError: keyword can't be an expression 

您可以將它們放入一個 字典並通過字典到find_all()作爲ATTRS 參數使用搜索這些屬性:

data_soup.find_all(attrs={"data-foo": "value"}) 
# [<div data-foo="value">foo!</div>] 

比:

price = search_result.find('span', attrs={"data-pricerate":"true"}) 

這將返回包含價格作爲字符串的span標籤,只需使用.text

price = search_result.find('span', attrs={"data-pricerate":"true"}).text