Python字典查找性能，得到vs在

這不是不成熟的優化。我的使用案例在內部循環的最內部對dict的權利進行了雙重檢查，一直運行。此外，它在智力上令人厭煩（見結果）。Python字典查找性能，得到vs在

哪種方法更快？

mydict = { 'hello': 'yes', 'goodbye': 'no' } 
key = 'hello' 

# (A) 
if key in mydict: 
    a = mydict[key] 
    do_things(a) 
else: 
    handle_an_error() 

# vs (B) 
a = mydict.get(key,None) 
if a is not None: 
    do_things(a) 
else: 
    handle_an_error()

編輯：這些是相同的速度。常識告訴我，（B）應該明顯更快，因爲它只有一次字典查找與2次對比，但結果不同。我在撓頭。基準的

結果平均超過12運行，其中1/2都命中，而另一半是失誤：

doing in 
switching to get 
total time for IN: 0.532250006994 
total time for GET: 0.480916659037 
times found: 12000000 
times not found: 12000000

當一個類似的運行（* 10個循環）而沒有找到關鍵，

doing in 
switching to get 
total time for IN: 2.35899998744 
total time for GET: 4.13858334223

爲什麼！？

（正確的）代碼

import time 
smalldict = {} 
for i in range(10): 
    smalldict[str(i*4)] = str(i*18) 

smalldict["8"] = "hello" 

bigdict = {} 
for i in range(10000): 
    bigdict[str(i*100)] = str(i*4123) 
bigdict["hello"] = "yes!" 

timetotal = 0 
totalin = 0 
totalget = 0 
key = "hello" 
found= 0 
notfound = 0 

ddo = bigdict # change to smalldict for small dict gets 
print 'doing in' 


for r in range(12): 
    start = time.time() 
    a = r % 2 
    for i in range(1000000): 
     if a == 0: 
      if str(key) in ddo: 
       found = found + 1 
       foo = ddo[str(key)] 
      else: 
       notfound = notfound + 1 
       foo = "nooo" 
     else: 
      if 'yo' in ddo: 
       found = found + 1 
       foo = ddo['yo'] 
      else: 
       notfound = notfound + 1 
       foo = "nooo" 
    timetotal = timetotal + (time.time() - start) 

totalin = timetotal/12.0 

print 'switching to get' 
timetotal = 0 
for r in range(12): 
    start = time.time() 
    a = r % 2 
    for i in range(1000000): 
     if a == 0: 
      foo = ddo.get(key,None) 
      if foo is not None: 
       found = found + 1 
      else: 
       notfound = notfound + 1 
       foo = "nooo" 
     else: 
      foo = ddo.get('yo',None) 
      if foo is not None: 
       found = found + 1 
       notfound = notfound + 1 
      else: 
       notfound = notfound + 1 
       foo = "oooo" 
    timetotal = timetotal + (time.time() - start) 

totalget = timetotal/12 

print "total time for IN: ", totalin 
print 'total time for GET: ', totalget 
print 'times found:', found 
print 'times not found:', notfound

（原始）碼導入時間 smalldict = {} 對於i在範圍（10）： smalldict [STR（I * 4）] = STR（ⅰ * 18）

smalldict["8"] = "hello" 

bigdict = {} 
for i in range(10000): 
    bigdict[str(i*100)] = str(i*4123) 
bigdict["8000"] = "hello" 

timetotal = 0 
totalin = 0 
totalget = 0 
key = "hello" 
found= 0 
notfound = 0 

ddo = bigdict # change to smalldict for small dict gets 
print 'doing in' 
for r in range(12): 
    start = time.time() 
    a = r % 2 
    for i in range(10000000): 
     if a == 0: 
      if key in ddo: 
       foo = ddo[key] 
      else: 
       foo = "nooo" 
     else: 
      if 'yo' in ddo: 
       foo = ddo['yo'] 
      else: 
       foo = "nooo" 
    timetotal = timetotal + (time.time() - start) 

totalin = timetotal/12.0 

print 'switching to get' 
timetotal = 0 
for r in range(12): 
    start = time.time() 
    a = r % 2 
    for i in range(10000000): 
     if a == 0: 
      foo = ddo.get(key,None) 
      if foo is not None: 
       # yaaay 
       pass 
      else: 
       foo = "nooo" 
     else: 
      foo = ddo.get('yo',None) 
      if foo is not None: 
       #yaaay 
       pass 
      else: 
       foo = "oooo" 
    timetotal = timetotal + (time.time() - start) 

totalget = timetotal/12 

print "total time for IN: ", totalin 
print 'total time for GET: ', totalget

來源

2016-09-19 std' ' OrgnlDave

它看起來並不像你實際上計時的情況，其中的關鍵是在字典中，所以'in'代碼實際上從來沒有實現雙重查找。 – user2357112

@ user2357112：很好的觀察。我猜想，成員函數查找爲'get'做了其餘的工作。我通過預先查看'get'：'g = ddo.get'將時間減少了10％，然後調用'foo = g（'yo'，None）' –

查找屬性並調用必須返回對象的方法*或*默認值永遠不會像單個操作碼那麼快，只會檢查密鑰的存在。儘管使用'timeit'模塊來做適當的時間試驗。 –

我們可以做一些更好的時機：

import timeit 

d = dict.fromkeys(range(10000)) 

def d_get_has(d): 
    return d.get(1) 

def d_get_not_has(d): 
    return d.get(-1) 

def d_in_has(d): 
    if 1 in d: 
     return d[1] 

def d_in_not_has(d): 
    if -1 in d: 
     return d[-1] 


print timeit.timeit('d_get_has(d)', 'from __main__ import d, d_get_has') 
print timeit.timeit('d_get_not_has(d)', 'from __main__ import d, d_get_not_has') 
print timeit.timeit('d_in_has(d)', 'from __main__ import d, d_in_has') 
print timeit.timeit('d_in_not_has(d)', 'from __main__ import d, d_in_not_has')

在我的電腦上，「in」變種比.get變種更快。這可能是因爲.get是字典中的一個屬性查找，並且屬性查找可能與字典上的成員資格測試一樣昂貴。請注意，in和使用dict[x]的項目查找可以直接在字節碼中完成，因此常規方法查找可以被繞過...

也可能值得指出的是，如果我只使用pypy :-)，我會得到一個巨大的優化：

$ python ~/sandbox/test.py 
0.169840812683 
0.1732609272 
0.122044086456 
0.0991759300232 

$ pypy ~/sandbox/test.py 
0.00974893569946 
0.00752687454224 
0.00812077522278 
0.00686597824097

來源

2016-09-19 21:19:08 mgilson

如果我的生產目標很穩定，我會用心跳來使用pypy。 –

我喜歡'in'語法而不是'get'，所以感謝更好的基準測試！ –

這也推動了一個方法查找可能導致多少開銷...... –

Python字典查找性能，得到vs在

回答

相關問題