Python：理解中重複函數調用的更好的解決方案

我有一個XML文件，我需要從中提取ID和標題字段（在頁面標籤下）。這就是我正在做的，它工作正常。但是，對於elem.find（'title）的三次調用我不滿意。有沒有更好的方法來避免理解？我明白，寫在一個循環將解決這個問題。Python：理解中重複函數調用的更好的解決方案

import xml.etree.ElementTree as ET 
tree = ET.parse(some file) 
root = tree.getroot() 
id_title_list = [(elem.find('id').text, elem.find('title').text) 
        for elem in root.findall('page') 
        if elem.find('title').text.startswith('string1') or 
        elem.find('title').text.startswith('string2')]

來源

2014-09-10 Gopala

是三個電話的問題，或者這是一個情況下，[過早的優化（https://en.wikipedia.org/wiki/Program_optimization #When_to_optimize）（萬惡之源）？ – martineau 2014-09-10 18:15:43

而不是兩次調用'startswith'，使用元組'（'string1'，'string2'）'作爲參數進行一次調用。 – chepner 2014-09-10 19:06:51

沒有什麼錯在它分解到一個正常的循環，並具有中間變量：

id_title_list = [] 
for elem in root.findall('page'): 
    title = elem.find('title').text 
    if title.startswith(('string1', 'string2')): 
     id_title_list.append((elem.find('id').text, title))

注意startswith()支持傳過來的元組多個前綴。

另一種選擇將是使XPath表達式內startswith()檢查：

id_title_list = [(elem.find('id').text, elem.find('title').text) 
        for elem in root.xpath('//page[.//title[starts-with(., "string1") or starts-with(., "string2")])]']

注意，因爲它提供了XPath表達式只提供有限的支持，這將不會xml.etree.ElementTree工作。 lxml會處理這個問題，只是改變了進口：

from lxml import etree as ET

來源

2014-09-10 18:07:35 alecxe

是的。不要在不需要時濫用列表解析。沒有人欣賞簡單的循環了。 – 2014-09-10 18:33:58

的一種方式，尊重的要求，這與理解來解決：

id_title_list = [ 
    (elem.find('id').text, title) 
     for elem, title in 
      (elem, elem.find('title').text for elem in root.findall('page')) 
       if title.startswith(('string1', 'string2'))]

它使用一個內部產生表達的唯一find評價每個元素一次。因爲它是一個懶惰評估的生成器，它應該避免中間列表的開銷。它也使用startswith的能力來獲取可能的前綴元組，儘管一次只查找標題文本，而不是速度的簡潔性。

所有這一切說，我同意亞歷克斯的答案，for循環是一個更好的選擇。

來源

2014-09-10 18:13:42

對於某些高階函數和itertools：

from operator import methodcaller 
from itertools import tee, imap, izip 

# Broken down into lots of small pieces; recombine as you see fit. 

# Functions for calling various methods on objects 
# Example: find_id(x) is the same as x.find('id') 
find_id = methodcaller('find', 'id') 
find_title = methodcaller('find', 'title') 
is_valid = methodcaller('startswith', ('string1', 'string2')) 
get_text = attrgetter('text') 

found = root.findall('page') # The original results... 
found_iters = tee(found, 2) # ... split into two. 

# Make two iterators resulting from calling `find` on each element... 
ids_iter = imap(get_text, imap(find_id, found_iters[0])) 
titles_iter = imap(get_text, imap(find_title, found_iters[1])) 

# And recombine them into a single iterable of tuples. 
id_title_pairs = izip(ids_iter, titles_iter) 

# Resulting in a nice, simple list comprehension 
id_title_list = [(id, title) 
        for id, title in id_title_pairs if is_valid(title)]

來源

2014-09-10 19:14:54 chepner

Python：理解中重複函數調用的更好的解決方案

回答

相關問題