2016-09-19 38 views
0

這不是不成熟的優化。我的使用案例在內部循環的最內部對dict的權利進行了雙重檢查,一直運行。此外,它在智力上令人厭煩(見結果)。Python字典查找性能,得到vs在

哪種方法更快?

mydict = { 'hello': 'yes', 'goodbye': 'no' } 
key = 'hello' 

# (A) 
if key in mydict: 
    a = mydict[key] 
    do_things(a) 
else: 
    handle_an_error() 

# vs (B) 
a = mydict.get(key,None) 
if a is not None: 
    do_things(a) 
else: 
    handle_an_error() 

編輯:這些是相同的速度。常識告訴我,(B)應該明顯更快,因爲它只有一次字典查找與2次對比,但結果不同。我在撓頭。基準的

結果平均超過12運行,其中1/2都命中,而另一半是失誤:

doing in 
switching to get 
total time for IN: 0.532250006994 
total time for GET: 0.480916659037 
times found: 12000000 
times not found: 12000000 

當一個類似的運行(* 10個循環)而沒有找到關鍵,

doing in 
switching to get 
total time for IN: 2.35899998744 
total time for GET: 4.13858334223 

爲什麼!?

(正確的)代碼

import time 
smalldict = {} 
for i in range(10): 
    smalldict[str(i*4)] = str(i*18) 

smalldict["8"] = "hello" 

bigdict = {} 
for i in range(10000): 
    bigdict[str(i*100)] = str(i*4123) 
bigdict["hello"] = "yes!" 

timetotal = 0 
totalin = 0 
totalget = 0 
key = "hello" 
found= 0 
notfound = 0 

ddo = bigdict # change to smalldict for small dict gets 
print 'doing in' 


for r in range(12): 
    start = time.time() 
    a = r % 2 
    for i in range(1000000): 
     if a == 0: 
      if str(key) in ddo: 
       found = found + 1 
       foo = ddo[str(key)] 
      else: 
       notfound = notfound + 1 
       foo = "nooo" 
     else: 
      if 'yo' in ddo: 
       found = found + 1 
       foo = ddo['yo'] 
      else: 
       notfound = notfound + 1 
       foo = "nooo" 
    timetotal = timetotal + (time.time() - start) 

totalin = timetotal/12.0 

print 'switching to get' 
timetotal = 0 
for r in range(12): 
    start = time.time() 
    a = r % 2 
    for i in range(1000000): 
     if a == 0: 
      foo = ddo.get(key,None) 
      if foo is not None: 
       found = found + 1 
      else: 
       notfound = notfound + 1 
       foo = "nooo" 
     else: 
      foo = ddo.get('yo',None) 
      if foo is not None: 
       found = found + 1 
       notfound = notfound + 1 
      else: 
       notfound = notfound + 1 
       foo = "oooo" 
    timetotal = timetotal + (time.time() - start) 

totalget = timetotal/12 

print "total time for IN: ", totalin 
print 'total time for GET: ', totalget 
print 'times found:', found 
print 'times not found:', notfound 

(原始)碼 導入時間 smalldict = {} 對於i在範圍(10): smalldict [STR(I * 4)] = STR(ⅰ * 18)

smalldict["8"] = "hello" 

bigdict = {} 
for i in range(10000): 
    bigdict[str(i*100)] = str(i*4123) 
bigdict["8000"] = "hello" 

timetotal = 0 
totalin = 0 
totalget = 0 
key = "hello" 
found= 0 
notfound = 0 

ddo = bigdict # change to smalldict for small dict gets 
print 'doing in' 
for r in range(12): 
    start = time.time() 
    a = r % 2 
    for i in range(10000000): 
     if a == 0: 
      if key in ddo: 
       foo = ddo[key] 
      else: 
       foo = "nooo" 
     else: 
      if 'yo' in ddo: 
       foo = ddo['yo'] 
      else: 
       foo = "nooo" 
    timetotal = timetotal + (time.time() - start) 

totalin = timetotal/12.0 

print 'switching to get' 
timetotal = 0 
for r in range(12): 
    start = time.time() 
    a = r % 2 
    for i in range(10000000): 
     if a == 0: 
      foo = ddo.get(key,None) 
      if foo is not None: 
       # yaaay 
       pass 
      else: 
       foo = "nooo" 
     else: 
      foo = ddo.get('yo',None) 
      if foo is not None: 
       #yaaay 
       pass 
      else: 
       foo = "oooo" 
    timetotal = timetotal + (time.time() - start) 

totalget = timetotal/12 

print "total time for IN: ", totalin 
print 'total time for GET: ', totalget 
+1

它看起來並不像你實際上計時的情況,其中的關鍵是在字典中,所以'in'代碼實際上從來沒有實現雙重查找。 – user2357112

+0

@ user2357112:很好的觀察。我猜想,成員函數查找爲'get'做了其餘的工作。我通過預先查看'get':'g = ddo.get'將時間減少了10%,然後調用'foo = g('yo',None)' –

+1

查找屬性並調用必須返回對象的方法*或*默認值永遠不會像單個操作碼那麼快,只會檢查密鑰的存在。儘管使用'timeit'模塊來做適當的時間試驗。 –

回答

2

我們可以做一些更好的時機:

import timeit 

d = dict.fromkeys(range(10000)) 

def d_get_has(d): 
    return d.get(1) 

def d_get_not_has(d): 
    return d.get(-1) 

def d_in_has(d): 
    if 1 in d: 
     return d[1] 

def d_in_not_has(d): 
    if -1 in d: 
     return d[-1] 


print timeit.timeit('d_get_has(d)', 'from __main__ import d, d_get_has') 
print timeit.timeit('d_get_not_has(d)', 'from __main__ import d, d_get_not_has') 
print timeit.timeit('d_in_has(d)', 'from __main__ import d, d_in_has') 
print timeit.timeit('d_in_not_has(d)', 'from __main__ import d, d_in_not_has') 

在我的電腦上,「in」變種比.get變種更快。這可能是因爲.get是字典中的一個屬性查找,並且屬性查找可能與字典上的成員資格測試一樣昂貴。請注意,in和使用dict[x]的項目查找可以直接在字節碼中完成,因此常規方法查找可以被繞過...

也可能值得指出的是,如果我只使用pypy :-),我會得到一個巨大的優化:

$ python ~/sandbox/test.py 
0.169840812683 
0.1732609272 
0.122044086456 
0.0991759300232 

$ pypy ~/sandbox/test.py 
0.00974893569946 
0.00752687454224 
0.00812077522278 
0.00686597824097 
+0

如果我的生產目標很穩定,我會用心跳來使用pypy。 –

+0

我喜歡'in'語法而不是'get',所以感謝更好的基準測試! –

+0

這也推動了一個方法查找可能導致多少開銷...... –