提取維基百科

http://en.wikipedia.org/wiki/List_of_cities_in_China 提取維基百科

所有城市我想提取所有的城市名稱，如下圖所示：

enter image description here

我用下面的代碼（僅用於提取一個字段），其中XPath是副本from chrome

from lxml import html 
import requests 

page = requests.get('http://en.wikipedia.org/wiki/List_of_cities_in_China') 
tree = html.fromstring(page.text) 

huabeiTree=tree.xpath('//*[@id="mw-content-text"]/table[3]/tbody/tr[1]/td[1]/a/text()') 
print huabeiTree

什麼都沒有出現。

我的最終目標是提取列表中的所有城市，我可以知道如何實現這一目標嗎？

來源

2014-10-30 william007

你的目標是什麼！如果你想獲得中國的所有城市，有一個更簡單的方法來做到這一點 – user3378649 2014-10-30 07:26:25

from lxml import html 
import requests 

page = requests.get('http://en.wikipedia.org/wiki/List_of_cities_in_China') 
tree = html.fromstring(page.text) 

huabeiTree=tree.xpath('//table[@class="wikitable sortable"]') 
list_of_cities_table = huabeiTree[0] # table[0] is what we need 

# Iterate over the table, get all the <tr> nodes 
#then get the values of cities with tr[0][0].text 
for tr in list_of_cities_table: 
    if tr[0].tag == 'td': 
     print tr[0][0].text

它打印出從北京到諸暨的656個城市名單。

P.S.也許這不太優雅。可以用更好的Xpath表達來改進。

來源

2014-10-30 08:43:29 sk11

提取維基百科

回答

相關問題