如果您需要最大效率,應該讓您非常接近的一個包裝是bitarray
。它用c寫成,所以它閃電般快。
要創建bitarray
,您可以傳遞構造函數bitarray
的任何類似布爾值的序列。例如:
>>> bitarray.bitarray([1, 1, 0, 1])
bitarray('1101')
>>> bitarray.bitarray([True, True, False, True])
bitarray('1101')
>>> bitarray.bitarray([range(3), [5.829048], [], ['can', 'I', 'help', 'you?']])
bitarray('1101')
我做了一些時機,確實bitarray
最快更長的字符串。但有一些驚喜:
bitarray
在大多數情況下只比int(''.join(bs))
快50%。
int(''.join(bs))
比爲3000 a
更長的長度提出tomasz的shift
方法快,而對於a
大於30000。
- 即使對於小
len(a)
,使用shift
方法只有幾個長度方式更快時間更快。 ''.join()
爲更長的字符串提供的漸進式性能提升大約爲秒或數十秒,而shift
方法僅在小字符串的毫秒量級上提供增益。所以在這裏使用''.join()
是明顯的贏家。
因此,如果您不想使用外部庫,那麼使用''.join()
就像上面這樣做是最好的解決方案!事實上,在這種情況下使用外部庫的好處是微乎其微的;所以最終我不會推薦它 - 除非你主要想節省內存。
最後,一個小小的提示:您不必追加'0b'
的字符串作爲你上面做的。只需撥打int(bitstring, 2)
- 基本參數(2
)使得'0b'
成爲冗餘。
>>> import array
>>> import random
>>> import bitarray
>>>
>>> #### Definitions: ####
>>>
>>> def a_mask_join(a):
..... d = dict()
..... for i in set(a):
..... d[i] = int(''.join([str(int(i is b)) for b in a]), 2)
..... return d
.....
>>> def mask(values, x):
..... m = 0
..... for v in values:
..... m = (m << 1) + (v == x)
..... return m
.....
>>> def a_mask_shift(a):
..... d = dict()
..... for i in set(a):
..... d[i] = mask(a, i)
..... return d
.....
>>> def a_mask_bitarray1(a):
..... d = dict()
..... for i in set(a):
..... d[i] = bitarray.bitarray([int(i is b) for b in a])
..... return d
.....
>>> def a_mask_bitarray2(a):
..... d = dict()
..... for i in set(a):
..... d[i] = int(bitarray.bitarray([int(i is b) for b in a]).to01(), 2)
..... return d
.....
>>> a = array.array('B', [4,5,13,4,4,9,12,13])
>>>
>>> #### Test: ####
>>>
>>> dicts = (f(a) for f in (a_mask_join, a_mask_shift1, a_mask_shift2, a_mask_bitarray2))
>>> sorted_results = (sorted(int(v) for v in d.values()) for d in dicts)
>>> all(r == sorted(a_mask1(a).values()) for r in sorted_results)
True
>>>
>>> #### Timing: ####
>>>
>>> for size in (int(10 ** (e/2.0)) for e in range(2, 11)):
..... print size
..... a = array.array('B', [random.randrange(0, 30) for _ in range(size)])
..... %timeit a_mask_join(a)
..... %timeit a_mask_shift(a)
..... %timeit a_mask_bitarray1(a)
..... %timeit a_mask_bitarray2(a)
.....
10
10000 loops, best of 3: 61.2 us per loop
100000 loops, best of 3: 17.5 us per loop
10000 loops, best of 3: 38.4 us per loop
10000 loops, best of 3: 46.7 us per loop
31
1000 loops, best of 3: 343 us per loop
10000 loops, best of 3: 97.9 us per loop
1000 loops, best of 3: 212 us per loop
1000 loops, best of 3: 242 us per loop
100
1000 loops, best of 3: 1.45 ms per loop
1000 loops, best of 3: 486 us per loop
1000 loops, best of 3: 825 us per loop
1000 loops, best of 3: 870 us per loop
316
100 loops, best of 3: 4.53 ms per loop
100 loops, best of 3: 2.46 ms per loop
100 loops, best of 3: 2.53 ms per loop
100 loops, best of 3: 2.65 ms per loop
1000
100 loops, best of 3: 14.5 ms per loop
100 loops, best of 3: 10.8 ms per loop
100 loops, best of 3: 7.78 ms per loop
100 loops, best of 3: 8.04 ms per loop
3162
10 loops, best of 3: 47.4 ms per loop
10 loops, best of 3: 71.8 ms per loop
10 loops, best of 3: 24.1 ms per loop
10 loops, best of 3: 25.6 ms per loop
10000
10 loops, best of 3: 137 ms per loop
1 loops, best of 3: 425 ms per loop
10 loops, best of 3: 75.7 ms per loop
10 loops, best of 3: 78 ms per loop
31622
1 loops, best of 3: 430 ms per loop
1 loops, best of 3: 3.25 s per loop
1 loops, best of 3: 241 ms per loop
1 loops, best of 3: 246 ms per loop
100000
1 loops, best of 3: 1.37 s per loop
1 loops, best of 3: 29.7 s per loop
1 loops, best of 3: 805 ms per loop
1 loops, best of 3: 800 ms per loop
長度爲32768,你的意思是最多有32768位?或者整數表示最大爲32768(其中最多有〜16位)? – 2012-02-29 00:26:39