如果某些特殊情況適用,您可以使用NumPy索引作爲字典查找的快速替代方法。
的想法是使用
lookup_array = np.empty((M,), dtype=values.dtype)
lookup_array[keys] = values
result = lookup_array[key_set]
代替
result = {lookup_dict.get(key) for key in key_set}
例如,
import numpy as np
import pandas as pd
def using_dict(lookup_dict, key_set):
return {lookup_dict.get(key) for key in key_set}
def using_array(lookup_array, key_set):
return lookup_array[key_set]
def using_pandas(df, key_set):
return df.loc[df['a'].isin(key_set)]
M = 10**6
N = 2*10**5
K = 10**4
keys = np.random.randint(M, size=(N,))
values = np.random.random((N,))
lookup_dict = dict(zip(keys, values))
lookup_array = np.empty((M,), dtype=values.dtype)
lookup_array[keys] = values
df = pd.DataFrame(np.column_stack([keys, values]), columns=list('ab'))
key_set = np.random.choice(keys, size=(K,))
這裏是用於上述方法的一個timeit基準(使用IPython的):
In [25]: %timeit using_array(lookup_array, key_set)
10000 loops, best of 3: 22.4 µs per loop
In [26]: %timeit using_dict(lookup_dict, key_set)
100 loops, best of 3: 3.73 ms per loop
In [24]: %timeit using_pandas(df, key_set)
10 loops, best of 3: 38.9 ms per loop
什麼是'lookup'? –
在第一個例子中查找是一個字典 – triphook
而不是'lookup_array [:,0]'而不是?另外,'key_set'包含'唯一'鍵嗎? – Divakar