2015-10-16 130 views
1

如果我必須減少一對值,我該如何編寫lambda表達式。減少對 - python

testr = [('r1', (1, 1)), ('r1', (1, 5)),('r2', (1, 1)),('r3', (1, 1))] 

所需的輸出是

('r1', (2, 6)),('r2', (1, 1)),('r3', (1, 1)) 

回答

1

通過重點減少碳排放量

.reduceByKey(lambda a, b: (a[0]+b[0], a[1]+b[1])) 

你可以把它與拉鍊任意長度元組更通用的:

.reduceByKey(lambda a, b: tuple(x+y for x,y in zip(a,b))) 
+0

我會嘗試這一點。我有字符串作爲鍵,值對總是包含整數值 – mhn

+0

您的數據顯示值對中的混合,如果它們都是整數,則可以消除「int()」調用。 – AChampion

0
it is not clear for me how reduce can use to reduce with lambda to reduce list tuples with different keys. My solution is can reduce list of tuples, but it uses function, which is perhaps too troublesome to do in pure lambda, if not impossible. 

def reduce_tuple_list(tl): 

    import operator as op 
    import functools as fun 
    import itertools as it 

    # sort the list for groupby 
    tl = sorted(tl,key=op.itemgetter(0)) 
    # this function with reduce lists with the same key 
    def reduce_with_same_key(tl): 
     def add_tuple(t1,t2): 
      k1, tl1 = t1 
      k2, tl2 = t2 
      if k1 == k2: 
       l1,r1 = tl1 
       l2,r2 = tl2 
       l = l1+l2 
       r = r1+r2 
       return k1,(l,r) 
      else: 
       return t1,t2 
     return tuple(fun.reduce(add_tuple, tl)) 

    # group by keys 
    groups = [] 
    for k, g in it.groupby(tl, key=op.itemgetter(0)): 
     groups.append(list(g)) 

    new_list = [] 
    # we need to add only lists whose length is greater than one 
    for el in groups: 
     if len(el) > 1: # reduce 
      new_list.append(reduce_with_same_key(el)) 
     else: # single tuple without another one with the same key 
      new_list.append(el[0]) 
    return new_list 


    testr = [('r1', (1, 1)), ('r3', (11, 71)), ('r1', (1, 5)),('r2', (1, 1)),('r3', (1, 1))] 

    >>> reduce_tuple_list(testr) 

    [('r1', (2, 6)), ('r2', (1, 1)), ('r3', (12, 72))] 
0

可以使用combineByKey方法

testr = sc.parallelize((('r1', (1, 1)), ('r1', (1, 5)),('r2', (1, 1)),('r3', (1, 1)))) 

testr.combineByKey(lambda x:x,lambda x,y:(x[0]+y[0],x[1]+y[1]),lambda x,y:(x[0]+x[1],y[0]+y[1])).collect()