硒 - 從無序列表中獲取元素

我對html不是很熟悉，但是想用硒來構建一個簡單的webscraper。我試圖訪問reddit.com上的評論，並且實際上挑出每個元素都有很多麻煩。我正在看的部分是這裏：硒 - 從無序列表中獲取元素

我已經嘗試了太多的東西，甚至沒有它的作品。困擾我的一件事是我用FirePath來複制Xpath，但它仍然不起作用。（它回來了）。吐出的Xpath是.//*[@id='thing_t3_5khd75']/div[2]/ul/li[1]/a

來源

2016-12-27 Astrum

顯示的代碼。還要說明你想得到的輸出：每篇文章的評論數量或特定文章的實際評論文章？ – Andersson

頁面頂部的幾個帖子是隱藏的，所以如果使用element.text你會得到空字符串。此外，我建議你不要使用FirePath，而是創造your own XPath instead，讓您的選擇更加靈活

爲了讓你可以使用（我想你使用Python爲我檢查您的個人資料:)）實際值：

posts = driver.find_elements_by_xpath('//a[@class="bylink comments may-blank"]') 
comments = {} 
for post in posts: 
    comments[post.get_attribute('href')] = post.get_attribute('innerHTML')

的comments輸出會像您使用

{'https://www.reddit.com/r/science/comments/5kfw6w/cheetahs_heading_towards_extinction_as_population/': '1904 comments', 
'https://www.reddit.com/r/pics/comments/5kh5q4/a_cutting_board_made_of_walnut_white_oak_maple/': '217 comments',...}

來源

2016-12-27 12:44:01 Andersson

我試圖修改，以獲得像使用xpath標題的東西，它沒有奏效。我的Xpath是：'driver.find_elements_by_xpath（'// a [@ class =「title may-blank loggedin outbound srTagged」]'）'。我想我可以使用正則表達式，但是我寧願不使用正則表達式。 – Astrum

你是否想改變代碼來獲得諸如'{「post_1_title」：「100評論」，「post_2_title」：「200評論」，...}'？ – Andersson

不，我只是說如果我試圖抓住標題，它不起作用。這是一張圖片：http://imgur.com/a/meaR3。標題下面是純文本。 – Astrum

硒 - 從無序列表中獲取元素

回答

相關問題