2010-12-21 118 views
10

我有一個二維數組,也就是一個也是數組的序列數組。對於每個序列,我想計算自相關,因此對於(5,4)數組,我會得到5個結果或維數組(5,7)。numpy中多維數組的自相關

我知道我可以在第一維上循環,但這很慢,也是我的最後一招。有另一種方法嗎?

謝謝!

編輯:

根據所選擇的答案加上mtrw的評論,我有以下功能:

def xcorr(x): 
    """FFT based autocorrelation function, which is faster than numpy.correlate""" 
    # x is supposed to be an array of sequences, of shape (totalelements, length) 
    fftx = fft(x, n=(length*2-1), axis=1) 
    ret = ifft(fftx * np.conjugate(fftx), axis=1) 
    ret = fftshift(ret, axes=1) 
    return ret 

注意長度是在我的代碼一個全局變量,所以一定要申報它。我也沒有將結果限制爲實數,因爲我需要考慮複數。

回答

8

使用FFT-based autocorrelation

import numpy 
from numpy.fft import fft, ifft 

data = numpy.arange(5*4).reshape(5, 4) 
print data 
##[[ 0 1 2 3] 
## [ 4 5 6 7] 
## [ 8 9 10 11] 
## [12 13 14 15] 
## [16 17 18 19]] 
dataFT = fft(data, axis=1) 
dataAC = ifft(dataFT * numpy.conjugate(dataFT), axis=1).real 
print dataAC 
##[[ 14.  8.  6.  8.] 
## [ 126. 120. 118. 120.] 
## [ 366. 360. 358. 360.] 
## [ 734. 728. 726. 728.] 
## [ 1230. 1224. 1222. 1224.]] 

我有點困惑你對有答案維度聲明(5,7),所以也許有什麼東西我不理解很重要。

編輯:在mtrw的建議,即不環繞軟墊版本:

import numpy 
from numpy.fft import fft, ifft 

data = numpy.arange(5*4).reshape(5, 4) 
padding = numpy.zeros((5, 3)) 
dataPadded = numpy.concatenate((data, padding), axis=1) 
print dataPadded 
##[[ 0. 1. 2. 3. 0. 0. 0. 0.] 
## [ 4. 5. 6. 7. 0. 0. 0. 0.] 
## [ 8. 9. 10. 11. 0. 0. 0. 0.] 
## [ 12. 13. 14. 15. 0. 0. 0. 0.] 
## [ 16. 17. 18. 19. 0. 0. 0. 0.]] 
dataFT = fft(dataPadded, axis=1) 
dataAC = ifft(dataFT * numpy.conjugate(dataFT), axis=1).real 
print numpy.round(dataAC, 10)[:, :4] 
##[[ 14.  8.  3.  0.  0.  3.  8.] 
## [ 126. 92. 59. 28. 28. 59. 92.] 
## [ 366. 272. 179. 88. 88. 179. 272.] 
## [ 734. 548. 363. 180. 180. 363. 548.] 
## [ 1230. 920. 611. 304. 304. 611. 920.]] 

必須有這樣做更有效的方式,特別是因爲自相關是對稱的,我不利用這一點。

+4

+1了基於FFT的方法。至於(5,7)形狀的答案,你已經計算了循環相關性(http://en.wikipedia.org/wiki/Discrete_Fourier_transform#Circular_convolution_theorem_and_cross-correlation_theorem)。只需用3個零填充每行,以便譜乘法不會環繞,並且您將得到要求提供的原始問題。 – mtrw 2010-12-21 20:48:36

1

對於非常大的數組,具有n = 2 ** p(其中p是整數)變得非常重要。這將爲您節省大量時間。例如:

def xcorr(x): 
    l = 2 ** int(np.log2(length * 2 - 1)) 
    fftx = fft(x, n = l, axis = 1) 
    ret = ifft(fftx * np.conjugate(fftx), axis = 1) 
    ret = fftshift(ret, axes=1) 
    return ret 

這可能會導致環繞錯誤。儘管如此,對於大型陣列來說,自動關聯在邊緣附近應該是微不足道的。

1

也許這只是一個偏好,但我想從定義中追隨。我個人覺得這樣更容易一些。這是我對任意nd數組的實現。

 
from itertools import product 
from numpy import empty, roll

def autocorrelate(x): """ Compute the multidimensional autocorrelation of an nd array. input: an nd array of floats output: an nd array of autocorrelations """ # used for transposes t = roll(range(x.ndim), 1) # pairs of indexes # the first is for the autocorrelation array # the second is the shift ii = [list(enumerate(range(1, s - 1))) for s in x.shape] # initialize the resulting autocorrelation array acor = empty(shape=[len(s0) for s0 in ii]) # iterate over all combinations of directional shifts for i in product(*ii): # extract the indexes for # the autocorrelation array # and original array respectively i1, i2 = asarray(i).T x1 = x.copy() x2 = x.copy() for i0 in i2: # clip the unshifted array at the end x1 = x1[:-i0] # and the shifted array at the beginning x2 = x2[i0:] # prepare to do the same for # the next axis x1 = x1.transpose(t) x2 = x2.transpose(t) # normalize shifted and unshifted arrays x1 -= x1.mean() x1 /= x1.std() x2 -= x2.mean() x2 /= x2.std() # compute the autocorrelation directly # from the definition acor[tuple(i1)] = (x1 * x2).mean() return acor