如何使用scrapy獲取職位描述？

我是scrapy和XPath的新手，但在Python中進行了一段時間的編程。我想從頁面https://www.germanystartupjobs.com/job/joblift-berlin-germany-3-working-student-offpage-seo-french-market/使用scrapy獲得email，name of the person making the offer和phone號碼。如您所見，電子郵件和電話是以<p>標籤內的文本形式提供的，因此很難提取。如何使用scrapy獲取職位描述？

我的想法是先把文本中的所有文本談論這個各自工作的Job Overview內或至少和使用ReGex得到email，phone number，如果可能的name of the person。

所以，我用scrapy shell https://www.germanystartupjobs.com/job/joblift-berlin-germany-3-working-student-offpage-seo-french-market/命令激發了scrapy shell，並從那裏得到response。

現在，我嘗試從div job_description中獲取所有文本，但實際上我什麼都沒有。我用

full_des = response.xpath('//div[@class="job_description"]/text()').extract()

返回[u'\t\t\t\n\t\t ']

我如何獲得所有從網頁中的文字提到？顯然，這個任務會在之後得到前面提到的屬性，但首先要做的是第一件事。

更新：此選項僅返回[]response.xpath('//div[@class="job_description"]/div[@class="container"]/div[@class="row"]/text()').extract()

來源

2016-12-16 Chak

[@類=「job_description」]你立刻去的div [@類=「容器」]，讓你跳過了一個元素稱爲「部分」。你可以在xpath查詢中使用它，或者使用//，例如div [@ class =「job_description」] // div [@ class =「container」]/..... – Borna

你是密切與

full_des = response.xpath('//div[@class="job_description"]/text()').extract()

的div - 標籤卻沒有，除了你所得到的任何文本。

<div class="job_description" (...)> 
    "This is the text you are getting" 
    <p>"This is the text you want"</p> 
</div>

正如你看到的，你與response.xpath('//div[@class="job_description"]/text()').extract()獲取文本是在div -tag之間，而不是在div - 標籤內的標籤之間的文本。爲此，您需要：

response.xpath('//div[@class="job_description"]//*/text()').extract()

這樣做是從div[@class="job_description]選擇所有子節點，並返回文本（見here什麼不同的XPath做）。

你會看到，這也返回了無用的文本，因爲你仍然得到所有的\n等等。爲此，我建議您將xpath縮小到您想要的元素，而不是採用廣泛的方法。

例如整個作業的描述是在

response.xpath('//div[@class="col-sm-5 justify-text"]//*/text()').extract()
股利後

來源

2016-12-16 08:49:35 rongon

感謝您的好評。我發佈了一個後續問題，可能有趣的是你回答「http：// stackoverflow.com/questions/41194574 /如何找到所有在網上列出的工作」 – Chak

如何使用scrapy獲取職位描述？

回答

相關問題