2015-10-06 38 views
-1

我一直在跑進牆壁。硒用美麗的湯爬行不同的html結構

相應的XPath正在收穫:

/html/body/div[8]/div/div[1]/div/div[3]/div[2]/div[2]/h2/a 
/html/body/div[8]/div/div[1]/div/div[3]/div[17]/div[2]/div[2]/h2/a 

我想解析出用於從網頁上述的XPath的各個項目。

這是我的代碼:

for j in range(2, innerElements): 

      headline = driver.find_element_by_xpath("/html/body/div[8]/div/div[1]/div/div[3]/div["+str(j)+"]/div[2]/h2/a").text 
      if headline: 
       print(headline) 
      elif headline: 
       headline = driver.find_element_by_xpath("/html/body/div[8]/div/div[1]/div/div[3]/div[17]/div["+str(j)+"]/div[2]/h2/a").text 
       print(headline) 

結果:

New York Dinner Cruise 
Big Apple Helicopter Tour of New York 
Empire State Building Tickets - Observatory and Optional Skip the Line Tickets 
Washington DC Day Trip from New York 
New York City Explorer Pass 
Circle Line: Complete Manhattan Island Cruise 
2-Day Niagara Falls Tour from New York by Bus 
Viator VIP: Empire State Building, Statue of Liberty and 9/11 Memorial 
Big Bus New York Hop-on Hop-off Tour 
New York CityPass 
9/11 Memorial and Ground Zero Walking Tour with Optional 9/11 Museum Upgrade 
New York in One Day Guided Sightseeing Tour 
Viator Exclusive: Niagara Falls Day Trip from New York by Private Plane 
Viator Exclusive: Statue of Liberty Monument Access and 9/11 Memorial 
New York City Guided Sightseeing Tour by Luxury Coach 

E 
====================================================================== 
ERROR: test_sel (__main__.Crawling) 
---------------------------------------------------------------------- 
Traceback (most recent call last): 
File "C:/Users/PycharmProjects/unti/US.py", line 53, in test_sel 
headline = driver.find_element_by_xpath("/html/body/div[8]/div/div[1]/div/div[3]/div["+str(j)+"]/div[2]/h2/a").text 
File "C:\Python34\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 250, in find_element_by_xpath 
return self.find_element(by=By.XPATH, value=xpath) 
File "C:\Python34\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 692, in find_element 
{'using': by, 'value': value})['value'] 
File "C:\Python34\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 193, in execute 
self.error_handler.check_response(response) 
File "C:\Python34\lib\site- packages\selenium\webdriver\remote\errorhandler.py", line 181, in check_response 
raise exception_class(message, screen, stacktrace) 
selenium.common.exceptions.NoSuchElementException: Message: Unable to locate element: {"method":"xpath","selector":"/html/body/div[8]/div/div[1]/div/div[3]/div[17]/div[2]/h2/a"} 
Stacktrace: 
at FirefoxDriver.prototype.findElementInternal_ (file:///C:/Users/hmattu/AppData/Local/Temp/tmp7kbz_wz2/extensions/[email protected] lecode.com/components/driver-component.js:10667) 
at FirefoxDriver.prototype.findElement (file:///C:/Users/hmattu/AppData/Local/Temp/tmp7kbz_wz2/extensions/[email protected]/components/driver-component.js:10676) 
at DelayedCommand.prototype.executeInternal_/h (file:///C:/Users/hmattu/AppData/Local/Temp/tmp7kbz_wz2/extensions/[email protected]/components/command-processor.js:12643) 
    at DelayedCommand.prototype.executeInternal_ (file:///C:/Users/hmattu/AppData/Local/Temp/tmp7kbz_wz2/extensions/[email protected]/components/command-processor.js:12648)at DelayedCommand.prototype.execute/< (file:///C:/Users/hmattu/AppData/Local/Temp/tmp7kbz_wz2/extensions/[email protected] lecode.com/components/command-processor.js:12590) 

---------------------------------------------------------------------- 
Ran 1 test in 27.367s 

FAILED (errors=1) 

I'm從第一次的XPath預期的結果,但我不知道到底爲什麼它不切換到第二xpath如果它沒有在第一個找到任何東西。

Unable to locate element: {"method":"xpath","selector":"/html/body/div[8]/div/div[1]/div/div[3]/div[17]/div[2]/h2/a"} 

任何人都可以提供反饋嗎?任何反饋表示讚賞

編輯

鏈接到網頁: http://www.viator.com/New-York-City/d687-allthingstodo

+0

XPath是不正確是我所看到的。您可以將鏈接粘貼到您的網頁上,否則該頁面的HTML將有所幫助。我更喜歡使用CSS選擇器。鑑於HTML,我們可以爲您提供更好的方法來實現您想要的。 – LINGS

+0

在我的描述上發佈了該網頁。謝謝你的幫助 –

回答

1

我會建議你使用CSS選擇器。

你在找什麼的CSS選擇器,

.bd h2.product-title a 

我不知道蟒蛇,我知道在Java中。但我猜測,

headlines = driver.find_elements_by_css_selector(".bd h2.product-title a") 

for headline in headlines: 
    print(headline.text)