2017-06-15 53 views
0

我有多個表,如下面的MySQL datadump中的表,每個表代表數據庫中的一行。我會提取以下信息以便將其遷移到不同的數據庫。使用Beautiful Soup提取XML表中的列

<table name="dashboard"> 
    <column name="id">1</column> 
    <column name="timestamp">2009-10-09 15:10:30</column> 
    <column name="config_offline">1</column> 
    <column name="item1">0.00</column> 
    <column name="item2">0.00</column> 
</table> 

<table name="orders"> 
    <column name="id">1</column> 
    <column name="timestamp">2016-08-04 08:39:13</column> 
    <column name="item">1</column> 
    <column name="payment">Check</column> 
    <column name="cost">175.00</column> 
    <column name="paid">175.00</column> 
    <column name="cancel">0</column> 
    <column name="received">1</column> 
</table> 

以下是我目前正在:

from bs4 import BeautifulSoup 

with open("test.xml", "r") as markup: 
    soup = BeautifulSoup(markup, "xml") 

for row in soup.find_all('column'): 
    print(row.text) 
with open("test.xml", "r") as markup: 
soup = BeautifulSoup(markup, "xml") 
# And I also try this, but this doesn't work neither. 
for row in soup.find_all('table'): 
    for c in row.find_all('column'): 
     print(c.text) 

這種方法的問題,現在我不能在兩個表名之間進行區分。有沒有辦法可以分別從兩個不同的表格中提取信息?

+0

有錯字在你的更新問題,更改最後一行'打印(c.text)' –

回答

0

似乎很明顯...先在「表」標籤上迭代,然後在其「列」標籤上對每個「表」標籤迭代。

1

您可以找到特定屬性的特定表:

import bs4 
div_test=""" 
<table name="dashboard"> 
    <column name="id">1</column> 
    <column name="timestamp">2009-10-09 15:10:30</column> 
    <column name="config_offline">1</column> 
    <column name="item1">0.00</column> 
    <column name="item2">0.00</column> 
</table> 
<table name="orders"> 
    <column name="id">1</column> 
    <column name="timestamp">2016-08-04 08:39:13</column> 
    <column name="item">1</column> 
    <column name="payment">Check</column> 
    <column name="cost">175.00</column> 
    <column name="paid">175.00</column> 
    <column name="cancel">0</column> 
    <column name="received">1</column> 
</table> 
""" 
soup = bs4.BeautifulSoup(div_test) 
table_dashboard = soup.find('table', {'name':"dashboard"}) 
table_orders = soup.find('table', {'name':"orders"}) 
print table_dashboard 
print '\n' 
print table_orders 

輸出會給你table_dashboardtable_orders

<table name="dashboard"> 
<column name="id">1</column> 
<column name="timestamp">2009-10-09 15:10:30</column> 
<column name="config_offline">1</column> 
<column name="item1">0.00</column> 
<column name="item2">0.00</column> 
</table> 


<table name="orders"> 
<column name="id">1</column> 
<column name="timestamp">2016-08-04 08:39:13</column> 
<column name="item">1</column> 
<column name="payment">Check</column> 
<column name="cost">175.00</column> 
<column name="paid">175.00</column> 
<column name="cancel">0</column> 
<column name="received">1</column> 
</table> 
相關問題