2016-11-19 79 views
1

我得到的數據以HTML表格形式從外部來源 -遍歷Python字典只檢索所需的行

from xml.etree import ElementTree as ET 

s = """<table> 
    <tr><th>Release</th><th>REFDB</th><th>URL</th></tr> 
    <tr><td>3.7.3</td><td>12345</td><td>http://google.com</td></tr> 
    <tr><td>3.7.4</td><td>456789</td><td>http://foo.com</td></tr> 
</table> 
""" 

用於轉換HTML表格字典

table = ET.XML(s) 
rows = iter(table) 
headers = [col.text for col in next(rows)] 
for row in rows: 
    values = [col.text for col in row] 
    out = dict(zip(headers, values)) 

現在我的預期輸出如下所示,我將通過命令行參數傳遞Release版本。 $蟒蛇myscript.py 3.7.3(我有這樣的代碼) 我找過當找到特定的發行版本的字典解決循環 - 在我的情況下,它是3.7.3

Release Version - 3.7.3 
REFDB - 12345 
URL - http://google.com 
+0

'''out'''只包含最後一行* *, – wwii

回答

1

你不需要字典。只是分析每個行的內容,看看是否發行版本的輸入相匹配:

#coding:utf-8 

import sys 
from lxml import html 

if len(sys.argv) != 2: 
    raise Exception("Please provide release version only") 

release_input = sys.argv[1].strip() 

data = """<table> 
    <tr><th>Release</th><th>REFDB</th><th>URL</th></tr> 
    <tr><td>3.7.3</td><td>12345</td><td>http://google.com</td></tr> 
    <tr><td>3.7.4</td><td>456789</td><td>http://foo.com</td></tr> 
</table> 
""" 

tree = html.fromstring(data) 
for row in tree.xpath('//tr')[1:]: 
    release, refbd, url = row.xpath('.//td/text()') 
    if release_input == release: 
     print("Release Version - {}".format(release)) 
     print("REFBD - {}".format(refbd)) 
     print("URL - {}".format(url)) 
     break 

print("{} release version wasn't found".format(release_input)) 
+0

感謝安德烈,我正是在尋找這一點。 – vpd

+0

@vpd我很高興我的回答有幫助。請不要忘記接受我的答案(作爲你的問題的答案),如果它幫助你解決問題:) –

1

假設每個版本只有一行,並且根本不需要其他版本,則可以創建一個函數來解析HTML,並在找到版本後立即返回代表版本的dict。如果找不到版本,它可能返回None代替:

from xml.etree import ElementTree as ET 

s = """<table> 
    <tr><th>Release</th><th>REFDB</th><th>URL</th></tr> 
    <tr><td>3.7.3</td><td>12345</td><td>http://google.com</td></tr> 
    <tr><td>3.7.4</td><td>456789</td><td>http://foo.com</td></tr> 
</table> 
""" 

def find_version(ver): 
    table = ET.XML(s) 
    rows = iter(table) 
    headers = [col.text for col in next(rows)] 
    for row in rows: 
     values = [col.text for col in row] 
     out = dict(zip(headers, values)) 
     if out['Release'] == ver: 
      return out 

    return None 

res = find_version('3.7.3') 
if res: 
    for x in res.items(): 
     print(' - '.join(x)) 
else: 
    print 'Version not found' 

輸出:

Release - 3.7.3 
URL - http://google.com 
REFDB - 12345 
0
from xml.etree import ElementTree as ET 

s = """<table> 
    <tr><th>Release</th><th>REFDB</th><th>URL</th></tr> 
    <tr><td>3.7.3</td><td>12345</td><td>http://google.com</td></tr> 
    <tr><td>3.7.4</td><td>456789</td><td>http://foo.com</td></tr> 
</table> 
""" 

table = ET.XML(s) 
rows = iter(table) 
headers = [col.text for col in next(rows)] 
master = {} 

for row in rows: 
    values = [col.text for col in row] 
    out = dict(zip(headers, values)) 
    if 'Release' in out: 
     master[out['Release']] = out 

# Use the release to get the right dict out of master 
print(master) 
if in_data in master: 
    for k, v in master[in_data]: 
     # print here 
     pass 
else: 
    print('Error') 
0
import lxml.html 
from collections import namedtuple 
s = """<table> 
    <tr><th>Release</th><th>REFDB</th><th>URL</th></tr> 
    <tr><td>3.7.3</td><td>12345</td><td>http://google.com</td></tr> 
    <tr><td>3.7.4</td><td>456789</td><td>http://foo.com</td></tr> 
    <tr><td>3.7.5</td><td>151515</td><td>http://foo.com</td></tr> 
</table> 
""" 
def info_gen(rows): 

    info = namedtuple('info', ['Release', 'REFDB', 'URL']) 
    for row in rows: 
     yield info(*row.xpath('.//text()')) 

html = lxml.html.fromstring(s) 
rows = html.xpath('//table//tr[td]') 

Release = input("Enter Release:") 
for info in info_gen(rows): 
    if Release in info: 
     print(info) 
     break 

出來:

Enter Release:3.7.5 
info(Release='3.7.5', REFDB='151515', URL='http://foo.com') 
0

如果積聚在列表中的字典:

result = [] 
for row in rows: 
    values = [col.text for col in row] 
    result.append(dict(zip(headers, values))) 

你可以過濾列表 -

import operator 
value = '3.7.3' 
release = operator.itemgetter('Release') 
refdb = operator.itemgetter('REFDB') 
url = operator.itemgetter('URL') 
data = [d for d in result if release(d) == value] 

然後打印出了過去的過濾器的所有詞典 -

f_string = 'Release Version - {}\nREFDB - {}\nURL - {}' 
for d in data: 
    print(f_string.format(release(d), refdb(d), url(d)))