使用Python從HTML元素生成列表

我使用selenium和BeautifulSoup從維基百科頁面創建了幾個列表。當我看網頁的源文件，鏈接我想從總是如此構成的信息：使用Python從HTML元素生成列表

<li><a href="/wiki/town_name,_California" title="town_name, California">town_name, state</a></li>

還有就是標籤中的鏈接，你可以對點擊將引導您到鎮上的wiki頁面。它總是/wiki/town_name,_California

我想在Python中使用for循環來找到這個結構的每個項目，但我不清楚如何編寫正則表達式。我想：

my_link = "//wiki//*,California"

和

my_link = "//wiki//*,_California"

但是，當我試圖運行：

br.find_element_by_link_text(my_link)

這些返回類似的錯誤：

NoSuchElementException: Message: no such element: Unable to locate element: {"method":"link text","selector":"//wiki//*,_California"}

我也試過：

import selenium, time 
import html5lib 
from bs4 import BeautifulSoup 
from selenium import webdriver 
from selenium.webdriver.common.action_chains import ActionChains 
from selenium.webdriver.common.keys import Keys 
pg_src = br.page_source.encode("utf") 
soup = BeautifulSoup(pg_src) 
br = webdriver.Chrome() 

url = "http://somewikipage.org" 

br.get(url) 

lnkLst = [] 
for lnk in br.find_element_by_partial_link_text(",_California"): 
    lnkLst.append(lnk)

，並得到這個：

NoSuchElementException: Message: no such element: Unable to locate element: {"method":"partial link text","selector":",_California"}

有沒有什麼辦法可以解決這個代碼，所以我可以建立我的目標鏈接列表？

來源

2017-10-10 ShaunO

你搶頁面的源代碼，湯。循環通過湯的鏈接。 – IamBatman

@IamBatman嘗試了soup.select（「a [href * =，_ California]」）並且得到了ValueError：不支持的無效CSS選擇器「a [href * =」 – ShaunO

@IamBatman得到了這個工作：soup.find_all（「a」，href = re.compile（「，_ California」））。感謝您指點我正確的方向。 – ShaunO

正如你在你的問題中提到的是br.find_element_by_partial_link_text(",_California")沒有工作，那是因爲,_California是不是真的link_text爲每HTML你提供。

根據您的問題，我們需要找到包含屬性href="/wiki/town_name,_California"的<a> tage。所以，你可以使用任何下列選項：

css_selector：

br.find_element_by_css_selector("a[href=/wiki/town_name,_California]")

xpath：

br.find_element_by_xpath("//a[@href='/wiki/town_name,_California']")

來源

2017-10-11 06:47:54 DebanjanB

閱讀css選擇器，他們是你的朋友。我認爲以下應該工作。

hrefs = [a.href for a in soup.select('li a[href^="/wiki/"]')]

來源

2017-10-10 22:29:05

看看css選擇器，但有隔離正確的問題。我插入了你的代碼，它返回了一個Nones列表。 – ShaunO

使用Python從HTML元素生成列表

回答

相關問題