Scrapy的XPath稍微改變頁

我試圖從一些刮廣告工作的信息www.upwork.comScrapy的XPath稍微改變頁

它的大部分可以通過簡單的XPath由於一些頁面有刮，但額外的項目（客戶的國家，客戶的金額僱用等...）或有點不同（固定費率的工作，增加了工作價格項目）

這打破了物品的Xpath。

這些項目沒有描述性的類名，您可以使用，如您在頁面的源代碼中看到的那樣。

修復率工作有關客戶 https://www.upwork.com/job/Education-portal-development_~0151e2b32662a05e13/

小時工很多細節與客戶的較少細節： https://www.upwork.com/job/Create-countdown-timer-which-resets-every-night_~01d2dad2d68abd7b8d/

項目的一些例子和XPath：

l.add_xpath('clientactivehires', '//*[@id="layout"]/div[2]/div[3]/div[2]/p[5]/span/text()', re = '(\d*) Active') 
l.add_xpath('fixedratevariable', '//*[@id="layout"]/div[2]/div[3]/div[1]/div[1]/div[2]/div/div[2]/p/strong/text()') 
l.add_xpath('fixedrate', '//*[@id="layout"]/div[2]/div[3]/div[1]/div[1]/div[2]/div/div[2]/p/strong/text()') 
l.add_xpath('hired', '//*[@id="layout"]/div[2]/div[3]/div[1]/div[2]/div[2]/div[1]/div/p[3]/span/text()', re = '(\d*)') 
l.add_xpath('interviewing', '//*[@id="layout"]/div[2]/div[3]/div[1]/div[2]/div[2]/div[1]/div[2]/p[3]/text()', re = '(\d*)') 
l.add_xpath('jobdescription', '//*[@id="layout"]/div[2]/div[3]/div[1]/div[2]/div[1]/p/text()')

我嘗試了很多東西，但XPath我無法使它工作，它在1頁上工作，但不能可靠地在其他人上工作。

我能做些什麼才能使其發揮作用？

來源

2016-04-15 Dan Breal

我會依靠「關於用戶」文本而不是讓所有的下列p兄弟姐妹：

$ scrapy shell https://www.upwork.com/job/Education-portal-development_~0151e2b32662a05e13/ 

>>> for item in response.xpath("//p[strong = 'About the Client']/following-sibling::p"): 
...  print(" ".join(map(unicode.strip, item.xpath(".//text()").extract()))) 
... 
India Bangalore 
      07:28 PM 
3 
     Jobs Posted 0% Hire Rate, 
     1 Open Job

您將可能需要提高邏輯和組此附加信息。

來源

2016-04-15 13:59:51 alecxe

Scrapy的XPath稍微改變頁

回答

相關問題