2017-02-19 76 views
3

我做了一個簡單的函數,當輸入一個向量時,它將返回一個輸出熱點編碼矩陣。如何加速一個熱點編碼器代碼

import numpy as np 

def ohc(x): 
    u = list(set(x)) 
    c = len(u) 
    X = np.zeros((len(x), c)) 
    for idx, val in enumerate(x): 
     for i in range(c): 
      if val == u[i]: 
       X[idx, i] = 1 
    return X 

inputx = np.random.randint(1, 4, 1000000) 
ohc(inputx) 
Out[2]: 
array([[ 0., 1., 0.], 
     [ 0., 1., 0.], 
     [ 0., 1., 0.], 
     ..., 
     [ 0., 0., 1.], 
     [ 0., 1., 0.], 
     [ 0., 1., 0.]]) 

但我想知道是否因爲這兩個for循環有任何方法來加速它?

 1000006 function calls in 1.102 seconds 

    Ordered by: standard name 

    ncalls tottime percall cumtime percall filename:lineno(function) 
    1 0.930 0.930 1.102 1.102 <ipython-input-32-fcf6d323f906>:1(ohc) 
    1 0.000 0.000 1.102 1.102 <string>:1(<module>) 
    2 0.000 0.000 0.000 0.000 {len} 
    1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects} 
    1 0.000 0.000 0.000 0.000 {numpy.core.multiarray.zeros} 
    1000000 0.172 0.000 0.172 0.000 {range} 
+1

你是否分析了代碼,看看瓶頸在哪裏?通過使用vanilla'list'和'set'類,我覺得你可能會否定numpy可以擁有的一些好處。也許有些numpy等價物可以做你想做的事情,並且讓操作更快一些。 –

+0

@PaulRooney不知道分析器,如果我正在讀取它的時間最長的範圍?我已經在問題 – UberStuper

回答

3

看起來像np.unique

uniq, inv = np.unique(x, return_inverse=True) 
result = np.zeros((len(x), len(uniq)), dtype=int) 
result[np.arange(len(x)), inv] = 1 
工作

響應於@ Divakar的基準:這裏是一個提供更多的信息的比較確認輕微的速度優勢dv在小字母,圍繞K=20跨越並反轉成用於在pp一個K=1000幾倍優勢。這是預期的,因爲pp利用了單熱的稀疏性。下面,K是字母的大小,N樣本的長度。

import numpy as np 
from timeit import timeit 

def pp(x): 
    uniq, inv = np.unique(x, return_inverse=True) 
    result = np.zeros((len(x), len(uniq)), dtype=int) 
    result[np.arange(len(x)), inv] = 1 

def dv(x): 
    (x[:,None] == np.unique(x)).astype(int) 


for K in (4, 10, 20, 40, 100, 200, 1000): 
    tpp, tdv = [], [] 
    print('@ K =', K) 
    for N in (1000, 10000, 100000): 
     data = np.random.choice(np.random.random(K), N, replace=True) 
     tdv.append(timeit('f(a)', number=100, globals={'f': dv, 'a': data})) 
     tpp.append(timeit('f(a)', number=100, globals={'f': pp, 'a': data})) 
    print('dv:', '{:.6f}, {:.6f}, {:.6f}'.format(*tdv), 'secs for 100 trials @ N = 1000, 10000, 100000') 
    print('pp:', '{:.6f}, {:.6f}, {:.6f}'.format(*tpp), 'secs for 100 trials @ N = 1000, 10000, 100000') 

打印:

@ K = 4 
dv: 0.003458, 0.038176, 0.421894 secs for 100 trials @ N = 1000, 10000, 100000 
pp: 0.004856, 0.052298, 0.603758 secs for 100 trials @ N = 1000, 10000, 100000 
@ K = 10 
dv: 0.005136, 0.056491, 0.663157 secs for 100 trials @ N = 1000, 10000, 100000 
pp: 0.005955, 0.054069, 0.719152 secs for 100 trials @ N = 1000, 10000, 100000 
@ K = 20 
dv: 0.007201, 0.084867, 0.988886 secs for 100 trials @ N = 1000, 10000, 100000 
pp: 0.007638, 0.084580, 0.891122 secs for 100 trials @ N = 1000, 10000, 100000 
@ K = 40 
dv: 0.010748, 0.130974, 1.498022 secs for 100 trials @ N = 1000, 10000, 100000 
pp: 0.009321, 0.103912, 1.080271 secs for 100 trials @ N = 1000, 10000, 100000 
@ K = 100 
dv: 0.025357, 0.292930, 2.946326 secs for 100 trials @ N = 1000, 10000, 100000 
pp: 0.011916, 0.147117, 1.641588 secs for 100 trials @ N = 1000, 10000, 100000 
@ K = 200 
dv: 0.033651, 0.560753, 6.042001 secs for 100 trials @ N = 1000, 10000, 100000 
pp: 0.022971, 0.221142, 3.580255 secs for 100 trials @ N = 1000, 10000, 100000 
@ K = 1000 
dv: 0.156715, 2.655647, 37.112166 secs for 100 trials @ N = 1000, 10000, 100000 
pp: 0.055516, 0.920938, 10.358050 secs for 100 trials @ N = 1000, 10000, 100000 

使用uint8,並允許@ Divakar的方法,使用更便宜的視圖鑄造:

@ K = 4 
dv: 0.003092, 0.038149, 0.386140 secs for 100 trials @ N = 1000, 10000, 100000 
pp: 0.004392, 0.043327, 0.554253 secs for 100 trials @ N = 1000, 10000, 100000 
@ K = 10 
dv: 0.004604, 0.054215, 0.501708 secs for 100 trials @ N = 1000, 10000, 100000 
pp: 0.004930, 0.051555, 0.607239 secs for 100 trials @ N = 1000, 10000, 100000 
@ K = 20 
dv: 0.006421, 0.067397, 0.665465 secs for 100 trials @ N = 1000, 10000, 100000 
pp: 0.006616, 0.054055, 0.703260 secs for 100 trials @ N = 1000, 10000, 100000 
@ K = 40 
dv: 0.008857, 0.087155, 0.862316 secs for 100 trials @ N = 1000, 10000, 100000 
pp: 0.006945, 0.060408, 0.733966 secs for 100 trials @ N = 1000, 10000, 100000 
@ K = 100 
dv: 0.015660, 0.142464, 1.426929 secs for 100 trials @ N = 1000, 10000, 100000 
pp: 0.008063, 0.070860, 0.908615 secs for 100 trials @ N = 1000, 10000, 100000 
@ K = 200 
dv: 0.025631, 0.235712, 2.401750 secs for 100 trials @ N = 1000, 10000, 100000 
pp: 0.008805, 0.101772, 1.111652 secs for 100 trials @ N = 1000, 10000, 100000 
@ K = 1000 
dv: 0.069953, 1.024585, 11.313402 secs for 100 trials @ N = 1000, 10000, 100000 
pp: 0.011558, 0.182684, 2.201837 secs for 100 trials @ N = 1000, 10000, 100000 
+0

中添加了輸出結果很好,看到了相關的權衡。基準測試也不錯! – Divakar

+0

@Divakar謝謝!隨意在你的文章中使用視圖優化。 –

+0

對不起,我一直在改變我接受的答案,但我想確保我測試了每種情況,我可能會使用。這是一個非常艱難的選擇,考慮簡單的@Divakar答案是什麼,但我主要關心的是速度,所以我認爲接受PP答案是公平的 – UberStuper

1

您的代碼運行在O(N [因爲組()] + NC [由於for循環])。在大多數實際的應用程序中,無論如何,最終都會使用O(nc)*,因爲您需要爲陣列分配空間。有一些技巧可以提高效率:

  1. 使用字典。字典是使用散列來實現的,平均而言,這應該是不變的。
  2. 不要在每一步迭代c個可能的特徵,而是記住每個特徵的索引。

這裏是我的實現:

import numpy as np 

def ohc(x): 
    elem_to_idx = {} 
    for e in x: 
     if e not in elem_to_idx: 
      elem_to_idx[e] = len(elem_to_idx) 
    c = len(elem_to_idx) 
    X = np.zeros((len(x), c)) 
    for idx, val in enumerate(x): 
     X[idx, elem_to_idx[val]] = 1 
    return X 

*取決於你打算與X矩陣做什麼,你可能想使用numpy.sparse矩陣,它不分配這麼大的內存和反過來可以使代替O在O(N)你的代碼運行(NC)

4

下面是使用從np.unique只爲唯一值的量化方法與原始數組進行比較以獲得單熱編碼陣列 -

(inputx[:,None] == np.unique(inputx)).astype(float) 

運行測試

其他方法 -

# Original soln 
def ohc(x): 
    u = list(set(x)) 
    c = len(u) 
    X = np.zeros((len(x), c)) 
    for idx, val in enumerate(x): 
     for i in range(c): 
      if val == u[i]: 
       X[idx, i] = 1 
    return X 

# @Tommalla's soln 
def ohc_dict(x): 
    elem_to_idx = {} 
    for e in x: 
     if e not in elem_to_idx: 
      elem_to_idx[e] = len(elem_to_idx) 
    c = len(elem_to_idx) 
    X = np.zeros((len(x), c)) 
    for idx, val in enumerate(x): 
     X[idx, elem_to_idx[val]] = 1 
    return X 

# @Paul Panzer's soln 
def unique_inverse(x): 
    uniq, inv = np.unique(x, return_inverse=True) 
    result = np.zeros((len(x), len(uniq)), dtype=int) 
    result[np.arange(len(x)), inv] = 1 
    return result 

計時 -

In [42]: inputx = np.random.randint(1, 4, 1000000) 

In [43]: %timeit ohc(inputx) 
1 loops, best of 3: 526 ms per loop 

In [44]: %timeit ohc_dict(inputx) 
1 loops, best of 3: 256 ms per loop 

In [45]: %timeit unique_inverse(inputx) 
10 loops, best of 3: 48.6 ms per loop 

In [46]: %timeit (inputx[:,None] == np.unique(inputx)).astype(float) 
10 loops, best of 3: 34.4 ms per loop 

進一步的性能助推

使用np.int8作爲輸出D型作進一步的性能提升所提出的方法 -

In [58]: %timeit (inputx[:,None] == np.unique(inputx)).astype(np.int8) 
10 loops, best of 3: 27.7 ms per loop 

正如@保羅裝甲建議,我們還可以用view,而不是類型強制轉換陣列上一些進一步推動以更獨特的數字 -

In [23]: inputx = np.random.randint(1, 40, 1000000) 

In [24]: %timeit (inputx[:,None] == np.unique(inputx)).astype(np.int8) 
10 loops, best of 3: 98.4 ms per loop 

In [25]: %timeit (inputx[:,None] == np.unique(inputx)).view(np.int8) 
10 loops, best of 3: 92.5 ms per loop