有效的方式來循環標籤與美麗的湯

我想從多個結構相似的XML標籤中提取信息。我循環每個孩子將其附加到字典。有沒有辦法避免每個標籤的for循環（如我的MWE中的sn和count）。有效的方式來循環標籤與美麗的湯

from bs4 import BeautifulSoup as bs 
import pandas as pd 

xml = """ 
    <info> 
    <tag> 
     <sn>9-542</sn> 
     <count>14</count> 
    </tag> 
    <tag> 
     <sn>3-425</sn> 
     <count>16</count> 
    </tag> 
    </info> 
    """ 

bs_obj = bs(xml, "lxml") 
info = bs_obj.find_all('tag') 


d = {} 

# I want to avoid these multiple for-loops 
d['sn'] = [i.sn.text for i in info] 
d['count'] = [i.count.text for i in info] 

pd.DataFrame(d)

來源

2016-06-09 jnshsrs

是否需要xml BeautifulSoup？您正在使用xml。你可以使用xpath和xml。 BeautifulSoup本身不支持XPath表達式。 lxml有一個BeautifulSoup兼容模式，它會嘗試解析破碎的HTML。你爲什麼使用BeautifulSoup？它將會像tree.xpath（「/ tag/sn」）一樣 - 找到標籤「tag」的所有子標籤，標籤的名稱是「sn」 – user565447

請考慮以下方法。
有2只爲這個解決方案的緣故循環被動態的（唯一需要改變，如果你想另一個標籤是needed_tags列表）：

from collections import defaultdict 

d = defaultdict(list) 

needed_tags = ['sn', 'count'] 
for i in info: 
    for tag in needed_tags: 
     d[tag].append(getattr(i, tag).text) 

print(d) 
>> defaultdict(<class 'list'>, {'count': ['14', '16'], 'sn': ['9-542', '3-425']})

對於您的具體例子，這可以簡化爲：

from collections import defaultdict 

d = defaultdict(list) 

for i in info: 
    d['sn'].append(i.sn.text) 
    d['count'].append(i.count.text) 

print(d) 
>> defaultdict(<class 'list'>, {'count': ['14', '16'], 'sn': ['9-542', '3-425']})

來源

2016-06-09 13:51:20 DeepSpace

有效的方式來循環標籤與美麗的湯

回答

相關問題