2014-09-22 51 views

回答

2

CPython中使用this check來決定何時調整:

if (!(so->used > n_used && so->fill*3 >= (so->mask+1)*2)) 

這基本上意味着,當2/3滿,容器將調整。

調整大小本身一倍的大集空間量和四倍它小的:

return set_table_resize(so, so->used>50000 ? so->used*2 : so->used*4); 

阿明·裏戈在PyPy實現其套使用詞典的評論所指出的,所以relevant resizing code是:

jit.conditional_call(d.resize_counter <= x * 3, 
        _ll_dict_resize_to, d, num_extra) 

這是相同的策略,因爲resize_counter是詞典中留下的空白空間。


在此之前,我採取了基準。您可以通過查找非常小的暫停來檢測調整大小。對於小尺寸這是有些隨意的,所以你必須小心去除噪音。下面是做一個腳本:

from collections import Counter 
import time 

iterations = 100 
internal_iterations = 100000 

def main(): 
    resizes = Counter() 

    for i in range(iterations): 
     print("Completion: [{:3.0%}]\r".format(i/iterations), end="") 

     s = set() 
     maxtime_so_far = 0 
     for j in range(internal_iterations): 
      start = time.time() 
      s.add(j) 
      end = time.time() 

      if end-start > maxtime_so_far: 
       maxtime_so_far = end-start 
       resizes[j] += 1 

    print() 

    format_string = "{0:<6} = 0b{0:<18b} [found %: {1:2.0%}]" 

    for index in sorted(resizes): 
     chance = resizes[index]/iterations 

     if chance >= 0.05: 
      print(format_string.format(index, chance)) 

main() 

,這裏是對CPython的輸出:

Completion: [99%] 
0  = 0b0     [found %: 100%] 
5  = 0b101    [found %: 83%] 
21  = 0b10101    [found %: 12%] 
85  = 0b1010101   [found %: 94%] 
341 = 0b101010101   [found %: 97%] 
1365 = 0b10101010101  [found %: 100%] 
5461 = 0b1010101010101  [found %: 100%] 
21845 = 0b101010101010101 [found %: 100%] 
87381 = 0b10101010101010101 [found %: 100%] 

你可以看到10101...₂模式,這是你從除以3 2的冪,會得到什麼這對象將調整大小。 (之後它會調整大小,因爲腳本是0索引的)。

在PyPy3,改變迭代的次數是大於(iterations = 1000; internal_iterations = 100000),我得到

Completion: [100%] 
0  = 0b0     [found %: 78%] 
5  = 0b101    [found %: 6%] 
21  = 0b10101    [found %: 5%] 
341 = 0b101010101   [found %: 24%] 
1365 = 0b10101010101  [found %: 66%] 
5461 = 0b1010101010101  [found %: 100%] 
21845 = 0b101010101010101 [found %: 100%] 
87381 = 0b10101010101010101 [found %: 71%] 

這意味着該戰略是PyPy一樣。根據迭代次數

Completion: [100%] 
0  = 0b0     [found %: 13%] 
5  = 0b101    [found %: 11%] 
21  = 0b10101    [found %: 22%] 
22  = 0b10110    [found %: 6%] 
23  = 0b10111    [found %: 5%] 
24  = 0b11000    [found %: 5%] 
341 = 0b101010101   [found %: 30%] 
1365 = 0b10101010101  [found %: 66%] 
5461 = 0b1010101010101  [found %: 98%] 

奇怪的是,可能是由於JIT或GC,有時我得到更多的東西一樣。我想這是由於圍繞該點的迭代次數相對較少,並且這可能並不意味着太多。如果GC收集發生在項目20的附近,則容易造成這種壓力。

+1

調整設置從未在'setobject.py'中。相反,它在[字典​​的RPython實現](https://bitbucket.org/pypy/pypy/raw/default/rpython/rtyper/lltypesystem/rdict.py)中。集合只是RPython的「void」值的字典。 – 2014-09-22 09:43:52

+0

太好了,謝謝:)。這使事情更加明顯。 – Veedrac 2014-09-22 10:16:30

相關問題