2015-07-21 49 views
5

我需要合併兩個迭代器。我寫了這個功能:合併兩個已排序迭代器而不替換

def merge_no_repeat(iter1, iter2, key=None): 
    """ 
    a = iter([(2, 'a'), (4, 'a'), (6, 'a')]) 
    b = iter([(1, 'b'), (2, 'b'), (3, 'b'), (4, 'b'), (5, 'b'), (6, 'b'), (7, 'b'), (8, 'b')]) 
    key = lambda item: item[0] 
    fusion_no_repeat(a, b, key) -> 
       iter([(1, 'b'), (2, 'a'), (3, 'b'), (4, 'a'), (5, 'b'), (6, 'a'), (7, 'b'), (8, 'b')]) 
    :param iter1: sorted iterator 
    :param iter2: sorted iterator 
    :param key: lambda get sorted key, default: lambda x: x 
    :return: merged iterator 
    """ 
    if key is None: 
     key = lambda x: x 
    element1 = next(iter1, None) 
    element2 = next(iter2, None) 
    while element1 is not None or element2 is not None: 
     if element1 is None: 
      yield element2 
      element2 = next(iter2, None) 
     elif element2 is None: 
      yield element1 
      element1 = next(iter1, None) 
     elif key(element1) > key(element2): 
      yield element2 
      element2 = next(iter2, None) 
     elif key(element1) == key(element2): 
      yield element1 
      element1 = next(iter1, None) 
      element2 = next(iter2, None) 
     elif key(element1) < key(element2): 
      yield element1 
      element1 = next(iter1, None) 

這個函數有效。但我認爲這太複雜了。是否有可能使用Python標準庫使這個函數最簡單?

+0

['itertools.chain'](https://docs.python.org/2/庫/ itertools.html#itertools.chain)? –

+0

這個問題可能更適合http://codereview.stackexchange.com – jonrsharpe

+0

@MathiasEttinger itertools.chain不會返回一個已排序的迭代器。 –

回答

0

您可以使用:

def merge_no_repeat(iter1, iter2, key=None): 
    if key is None: 
     key = lambda x: x 
    ref = next(iter1, None) 
    for elem in iter2: 
     key_elem = key(elem) # caching value so we won't compute it for each value in iter1 that is before this one 
     while ref is not None and key_elem > key(ref): 
      # Catch up with low values from iter1 
      yield ref 
      ref = next(iter1, None) 
     if ref is None or key_elem < key(ref): 
      # Catch up with low values from iter2, eliminate duplicates 
      yield elem 
    # Update: I forgot to consume iter1 in the first version of this code 
    for elem in iter1: 
     # Use remaining items of iter1 if needed 
     yield elem 

我假定迭代器不會返回None值除非完全消耗掉,因爲你必須在你的原代碼if element1 is None:elif element1 is None:測試。


實例:

>>> from operator import itemgetter 
>>> list(merge_no_repeat(
...  iter([(2, 'a'), (4, 'a'), (6, 'a')]), 
...  iter([(1, 'b')]), 
...  itemgetter(0))) 
[(1, 'b'), (2, 'a'), (4, 'a'), (6, 'a')] 
>>> list(merge_no_repeat(
...  iter([(2, 'a'), (4, 'a'), (6, 'a')]), 
...  iter([(1, 'b'),(7, 'b'), (8, 'b')]), 
...  itemgetter(0))) 
[(1, 'b'), (2, 'a'), (4, 'a'), (6, 'a'), (7, 'b'), (8, 'b')] 
>>> list(merge_no_repeat(
...  iter([(2, 'a'), (4, 'a'), (6, 'a')]), 
...  iter([(1, 'b'),(3, 'b'), (4,'b'),(5,'b'),(7, 'b'), (8, 'b')]), 
...  itemgetter(0))) 
[(1, 'b'), (2, 'a'), (3, 'b'), (4, 'a'), (5, 'b'), (6, 'a'), (7, 'b'), (8, 'b')] 
+0

首先,你需要在最後的'if'中將'None'改爲'不是None',但是它仍然返回錯誤的結果!試試我的例子! – Kasramvd

+0

@Kasramvd在最後如果,'a是None'實際上是一個測試,以檢查是否iter1被消耗。 –

+0

哎呀,'a'實際上是'ref'。編輯 –

1

其中一個,如果其中一個迭代器返回None,則失敗,您應該捕獲StopIteration異常。二,一旦一個迭代器沒有更多的值,你可以返回另一個值的其餘值。

我覺得這是比較容易讓你使用小包裝類各地的迭代器,使可見的下一個值:

class NextValueWrapper(object): 
    def __init__(self, iterator): 
     self.iterator = iterator 
     self.next_value = None 
     self.finished = False 
     self.get() 

    def get(self): 
     if self.finished: return # Shouldn't happen, maybe raise an exception 
     value = self.next_value 
     try: 
      self.next_value = next(self.iterator) 
     except StopIteration: 
      self.finished = True 
     return value 

然後代碼變爲:

def merge(iter1, iter2, key=None): 
    if key is None: 
     key = lambda x: x 

    wrap1 = NextValueWrapper(iter1) 
    wrap2 = NextValueWrapper(iter2) 

    while not (wrap1.finished and wrap2.finished): 
     if (wrap2.finished or 
       (not wrap1.finished and 
       key(wrap1.next_value) <= key(wrap2.next_value))): 
      yield wrap1.get() 
     else: 
      yield wrap2.get() 

這是未經測試。它重複。而且它是Python 2,出於習慣。使它不重複是留給讀者的一個練習,我沒有注意到這也是一個要求...

+0

'NexValueWrapper.get'無條件地返回'None'。因此,如果兩個迭代器都沒有完成,你的代碼將產生「None」。 –

+0

@MathiasEttinger:是的(忘了返回語句)和另一個bug,for循環產生值忘記產生下一個值。幸運的是我只是洗了澡,這會帶來清晰度,會編輯:-) – RemcoGerlich