使xpath更具選擇性？ [網絡刮]

我想打印一些房價，並且在使用Xpath時遇到了麻煩。這裏是我的代碼：使xpath更具選擇性？ [網絡刮]

from selenium import webdriver 
driver = webdriver.Chrome("my/path/here") 

driver.get("https://www.realtor.com/realestateandhomes-search/?pgsz=10") 
for house_number in range(1,11): 
    try: 
     price = driver.find_element_by_xpath("""//*[@id=" 
{}"]/div[2]/div[1]""".format(house_number)) 
     print(price.text) 
    except: 
     print('couldnt find')

我在this網站，試圖打印關閉前十樓的房子的房價。

我的輸出是對於所有說「新」的房屋，這個價格取代實際價格。但是對於沒有新貼紙的最下面兩個，記錄了實際價格。

如何製作我的Xpath選擇器，以便它選擇數字而不是NEW？

來源

2017-10-05 thewhitetie

您正處於正確的軌道上，您剛剛製作了一個太脆弱的XPath。我會盡量讓它更加冗長，而不依賴於索引和通配符。

這是你的XPath（我用id="1"例如用途）：

//*[@id="1"]/div[2]/div[1]

而這裏的HTML（一些屬性/元素簡潔，刪除）：

<li id="1"> 
    <div></div> 
    <div class="srp-item-body"> 
     <div>New</div><!-- this is optional! --> 
     <div class="srp-item-price">$100,000</div> 
    </div> 
</li>

首先，將*通配符替換爲您期望包含的元素。這只是作爲一種方法來幫助「自文檔」中的XPath更好一點：

//li[@id="1"]/div[2]/div[1]

接下來，你要定位的第二<div>，但不是通過索引搜索，嘗試使用元素的屬性如果適用，如class：

//li[@id="1"]/div[@class="srp-item-body"]/div[1]

最後，你要定位的<div>的價格。由於「新」文本位於其自己的<div>中，因此您的XPath將目標定位爲第一個<div>（「新」），而不是價格爲<div>。如果「新」文本<div>不存在，那麼您的XPath確實有效。

我們可以使用與上一步類似的方法，通過屬性進行定位。這迫使XPath來始終瞄準<div>的價格：

//li[@id="1"]/div[@class="srp-item-body"]/div[@class="srp-item-price"]

希望這有助於！

所以......話說這一切，如果你是在價格，沒有別的有興趣，這很可能也是工作:)

for price in driver.find_elements_by_class_name('srp-item-price'): 
    print(price.text)

來源

2017-10-05 23:31:22

嗨，感謝您的努力，我欣賞評論和深思熟慮的解釋。但是，當我嘗試運行該代碼時，我現在發現一個錯誤，即Selenium根本找不到該元素（即，對於任何房屋）！我將我的代碼更改爲： 'price = driver.get_element_by_xclass（「」「// li [@id =」{}「]/div [@ class =」srp-item-body「]/div [@class =「srp-item-price」]「」「。format（house_number））'\t 而這會拋出一個異常，即每次都無法找到元素。 – thewhitetie

它在我的Chrome控制檯中工作，你嘗試使用'driver.find_element_by_xpath'嗎？ –

你可以把它寫這樣無需加載圖像，它可以增加你的抓取速度

from selenium import webdriver 
# Unloaded image 
chrome_opt = webdriver.ChromeOptions() 
prefs = {"profile.managed_default_content_settings.images": 2} 
chrome_opt.add_experimental_option("prefs", prefs) 
driver = webdriver.Chrome(chrome_options=chrome_opt,executable_path="my/path/here") 
driver.get("https://www.realtor.com/realestateandhomes-search/Bladen-County_NC/sby-6/pg-1?pgsz=10") 
for house_number in range(1,11): 
    try: 
     price = driver.find_element_by_xpath('//*[@id="{}"]/div[2]/div[@class="srp-item-price"]'.format(house_number)) 
     print(price.text) 
    except: 
     print('couldnt find')

來源

2017-10-06 02:45:00 kerberos

我發現了與上述相同的解決方案。 – Sagar007

你可以試試這個代碼：

from selenium import webdriver 
driver = webdriver.Chrome() 
driver.maximize_window() 
driver.get("https://www.realtor.com/realestateandhomes-search/Bladen-County_NC/sby-6/pg-1?pgsz=10") 

prices=driver.find_elements_by_xpath('//*[@class="data-price-display"]') 

for price in prices: 
    print(price.text)

它將打印

$39,900 
$86,500 
$39,500 
$40,000 
$179,000 
$31,000 
$104,900 
$94,900 
$54,900 
$19,900

不要讓我知道是否還需要其他細節

來源

2017-10-06 06:49:09 thebadguy

使xpath更具選擇性？ [網絡刮]

回答

相關問題