2015-05-04 33 views
2

我有兩個列表,aba包含我想知道b中匹配元素索引的元素。在b中,每個元素都是唯一的,與a不同。匹配元素的索引給出兩個列表,其中一個具有冗餘條目

a = [1993, 1993, 1994, 1995, 1996, 1996, 1998, 2003, 2005, 2005] 
b = [1966, 1967, 1968, 1969, 1970, 1971, 1972, 1973, 1974, 1975, 1976, 1977, 1978, 1979, 1980, 1981, 1982, 1983, 1984, 1985, 1986, 1987, 1988, 1989, 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014] 

利用該解決方案從Finding the indices of matching elements in list in Python

matching = [match for match, element in enumerate(b) if element in a] 

matching但是隻有[27, 28, 29, 30, 32, 37, 39],但我希望它是[27, 27, 28, 29, 30, 30, 32, 37, 39, 39]

回答

6
>>> a = [1993, 1993, 1994, 1995, 1996, 1996, 1998, 2003, 2005, 2005] 
>>> b = [1966, 1967, 1968, 1969, 1970, 1971, 1972, 1973, 1974, 1975, 1976, 1977, 1978, 1979, 1980, 1981, 1982, 1983, 1984, 1985, 1986, 1987, 1988, 1989, 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014] 
>>> [b.index(x) for x in a] 
[27, 27, 28, 29, 30, 30, 32, 37, 39, 39] 
+3

在這種情況下工作,但如果'a'中的任何值不在'b'中,將引發ValueError。如果b.count(x)]中的[b.index(x)for x會更安全... –

+0

當然,這完全取決於你是否希望它失敗。 –

1

什麼

print [b.index(i) for i in a if i in b] 
0

如果你有大的列表製作BA組的效率會更高:

st = set(b) 
print([b.index(x) for x in a if x in st]) 

當你的數據進行排序,並從假定所有元素都在B您也可以使用bisect,以便每個索引查找爲O(log n):

a = [1993, 1993, 1994, 1995, 1996, 1996, 1998, 2003, 2005, 2005] 
b = [1966, 1967, 1968, 1969, 1970, 1971, 1972, 1973, 1974, 1975, 1976, 1977, 1978, 1979, 1980, 1981, 1982, 1983, 1984, 1985, 1986, 1987, 1988, 1989, 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014] 


from bisect import bisect_left 
print [bisect_left(b, x) for x in a] 
[27, 27, 28, 29, 30, 30, 32, 37, 39, 39] 

在小數據集運行快兩倍,只是索引:

In [22]: timeit [bisect_left(b, x) for x in a] 
100000 loops, best of 3: 4.2 µs per loop 

In [23]: timeit [b.index(x) for x in a] 
100000 loops, best of 3: 8.84 µs per loop 

另一種選擇是使用一個字典來存儲這意味着該代碼將在線性時間運行的指標,一個傳過來和一個傳過b:

# store all indexes as values and years as keys 
indexes = {k: i for i, k in enumerate(b)} 
# one pass over a accessing each index in constant time 
print [indexes[x] for x in a] 
[27, 27, 28, 29, 30, 30, 32, 37, 39, 39] 

,甚至連小輸入設定的功能要比索引,更高效的一個增長將是一個很大更有效率:

In [34]: %%timeit                
indexes = {k: i for i, k in enumerate(b)} 
[indexes[x] for x in a] 
    ....: 
100000 loops, best of 3: 7.54 µs per loop 

In [39]: b = list(range(1966,2100)) 
In [40]: samp = list(range(1966,2100)) 
In [41]: a = [choice(samp) for _ in range(100)] 

In [42]: timeit [b.index(x) for x in a 
10000 loops, best of 3: 154 µs per loop 
In [43]: %%timeit      
indexes = {k: i for i, k in enumerate(b)} 
[indexes[x] for x in a] 
    ....: 
10000 loops, best of 3: 22.5 µs per loop 
+0

現在,您只需查看兩次,一次查看一次,一次查看列表。只有當它不在列表中時,這纔是改進。 –

+0

@LennartRegebro。因此,0(1)查找比b中沒有出現的多個值的線性要慢?你認爲如果你在b中有100萬個元素,使用列表更好? –

+0

不,但如果它在列表中,哪個OP的問題假設,那麼你根本不需要if測試。如果它主要在列表中,那麼捕獲錯誤會更好。 –

1

這是Padraic Cunningham的suggestion to use sets的擴展。相反,如果你轉換你關閉索引到字典的列表,就可以實現O(1)查找,爲O(n)的預處理:

a = [1993, 1993, 1994, 1995, 1996, 1996, 1998, 2003, 2005, 2005] 
b = [1966, 1967, 1968, 1969, 1970, 1971, 1972, 1973, 1974, 1975, 1976, 1977, 1978, 1979, 1980, 1981, 1982, 1983, 1984, 1985, 1986, 1987, 1988, 1989, 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014] 

d = {value: index for index, value in enumerate(b)} 
print([d[x] for x in a]) 



>>> timeit("[bisect_left(b, x) for x in a]", "from __main__ import a, b; from bisect import bisect_left") 
3.513558427607279 
>>> timeit("[b.index(x) for x in a]", "from __main__ import a, b") 
8.010070997323822 
>>> timeit("d = {value: index for index, value in enumerate(b)}; [d[x] for x in a]", "from __main__ import a, b") 
5.5277420695707065 
>>> timeit("[d[x] for x in a]", "from __main__ import a, b, ;d = {value : index for index, value in enumerate(b)}") 
1.1214096146165389 

所以,如果你打折的預處理,你幾乎比在實際處理中使用b.index快8倍 - 如果您在a的列表中排列較少的b,那麼效果會更好。如果只做一次,則使用bisect_left會更快,並且可以保證b是單調遞增的。

相關問題