numpy：在多個非連續軸上計算平均值和標準偏差（第二次嘗試）

[本文的早期版本完全沒有答覆，所以，如果這是由於缺乏清晰度，我重新編寫了它，用額外的解釋和代碼註釋。]numpy：在多個非連續軸上計算平均值和標準偏差（第二次嘗試）

我想計算在numpy的ñ維陣列，其不對應於一個單一的軸線的元件的平均值和標準偏差（而是ķ> 1 非連續的軸），並將結果收集到新的（n-k + 1）維a rray。

numpy是否包含標準構造以有效執行此操作？

下面複製功能mu_sigma是解決這個問題我最好的嘗試，但它有兩個明顯的低效率：1）它需要做的原始數據的副本; 2）計算平均值兩次（因爲標準偏差的計算需要計算平均值）。

mu_sigma函數有兩個參數：box和axes。 box是n-維度numpy陣列（又名「ndarray」），而axes是整數的整數，表示（不一定是連續的）box的維數。該函數返回一個新的（n - + 1）維度圖像，該圖像包含在由指定座標軸表示的box「hyperslabs」上計算出的平均值和標準偏差。

下面的代碼還包含一個mu_sigma實例。在這個例子中，box參數是一個4 X X X X 4浮點數的ndarray，並且axes參數是元組（1， 3）。（因此，我們有ň == len(box.shape) == 5，和ķ == len(axes) == 2）結果（這在這裏我會打電話給outbox）返回此示例輸入是4 x x x 2 ndarray的浮點數。爲索引的每個三元組我，ķ，Ĵ（其中每個指數範圍在集合{0， 1， 2， 3}），元件outbox[i, j, k, 0]是由指定的6個元素的平均值numpy表達式box[i, 0:2, j, 0:3, k]。同樣，outbox[i, j, k, 1]是相同6個元素的標準偏差。這意味着，第一Ñ - 結果範圍的 ķ == 3個維度上相同的索引作爲做Ñ - ķ非軸輸入的尺寸ndarray box，在這種情況下是尺寸0,2和4

在mu_sigma中使用的策略是

置換的夢詩離子（使用transpose方法），以便函數第二個參數中指定的軸全部放在最後;其餘（非軸）尺寸保留在開始處（按其原始順序）;
將軸尺寸合併爲一個（通過使用reshape方法）;新的「崩潰」維度現在是重塑的ndarray的最後一個維度;
使用最後一個「摺疊」的尺寸作爲座標軸計算平均值的手段;
使用上一個「摺疊」尺寸作爲座標軸計算標準偏差的圖表;
返回從級聯中（3）中產生的ndarrays獲得的ndarray和（4）

import numpy as np 

def mu_sigma(box, axes): 
    inshape = box.shape 

    # determine the permutation needed to put all the dimensions given in axes 
    # at the end (otherwise preserving the relative ordering of the dimensions) 
    nonaxes = tuple([i for i in range(len(inshape)) if i not in set(axes)]) 

    # permute the dimensions 
    permuted = box.transpose(nonaxes + axes) 

    # determine the shape of the ndarray after permuting the dimensions and 
    # collapsing the axes-dimensions; thanks to Bago for the "+ (-1,)" 
    newshape = tuple(inshape[i] for i in nonaxes) + (-1,) 

    # collapse the axes-dimensions 
    # NB: the next line results in copying the input array 
    reshaped = permuted.reshape(newshape) 

    # determine the shape for the mean and std ndarrays, as required by 
    # the subsequent call to np.concatenate (this reshaping is not necessary 
    # if the available mean and std methods support the keepdims keyword; 
    # instead, just set keepdims to True in both calls). 
    outshape = newshape[:-1] + (1,) 

    # compute the means and standard deviations 
    mean = reshaped.mean(axis=-1).reshape(outshape) 
    std = reshaped.std(axis=-1).reshape(outshape) 

    # collect the results in a single ndarray, and return it 
    return np.concatenate((mean, std), axis=-1) 

inshape = 4, 2, 4, 3, 4 
inbuf = np.array(map(float, range(np.product(inshape)))) 
inbox = np.ndarray(inshape, buffer=inbuf) 
outbox = mu_sigma(inbox, tuple(range(len(inshape))[1::2])) 

# "inline tests" 
assert all(outbox[..., 1].ravel() == 
      [inbox[0, :, 0, :, 0].std()] * outbox[..., 1].size) 
assert all(outbox[..., 0].ravel() == [float(4*(v + 3*w) + x) 
             for v in [8*y - 1 
               for y in [3*z + 1 
                  for z in range(4)]] 
             for w in range(4) 
             for x in range(4)])

來源

2012-01-04 kjo

這種方法對我來說似乎是正確的。意思是比std快得多，我不擔心兩次計算均值。當使用numpy/matlab類型向量化來創建臨時數據副本時，這很常見。這是我們爲numpy的可用性和速度付出的代價，除非你遇到某種內存限制，否則我不會擔心它。關於'newshape'的一個小記錄，嘗試'newshape = tuple（inshape [i] for i in nonaxes）+（-1，）' – 2012-01-04 19:29:02

@Bago：最後一個元素是-1的形式是一個形狀元組！謝謝！ – kjo 2012-01-04 19:52:42

@kjo：順便說一句，你不必重新發布你的問題。根據[這個元問題]（http://meta.stackexchange.com/questions/7046/how-do-i-get-attention-for-my-old-unanswered-questions），你可以編輯它（與有用的信息，你的進步/更好的解釋），它會發生碰撞。 – voithos 2012-01-04 20:54:55

它看起來像這樣得到了作爲numpy的2.0的更容易一些。

http://projects.scipy.org/numpy/ticket/1234

來源

2012-01-19 01:13:55

numpy：在多個非連續軸上計算平均值和標準偏差（第二次嘗試）

回答

相關問題