2017-07-17 65 views
1

numpy(或scipy)可能檢索直方圖的每個bin中的權重平方和嗎?我希望在我的直方圖中顯示每個紙箱高度的錯誤。對於未稱量的數據,每個箱高的統計誤差應該是sqrt(N),其中N是箱高度,但對於加權數據,我需要對權重平方進行求和。 numpy.histogram不能做到這一點,但是在numpy或scipy中是否還有一些其他功能可以基於不同的數組(例如,我是直方圖的值數組)排列數組(例如權重數組)?我仔細閱讀了文檔,但沒有發現任何內容。numpy.histogram:檢索每個bin中權重的平方和

+2

我會開始[numpy.digitize](https://docs.scipy.org/doc/numpy/reference/generated/numpy.digitize.html#numpy.digitize) – FTP

+0

我不明白。你能用更多的數學術語來說明嗎? – obachtos

+0

@obachtos假設我有一個數組'x = [2,9,4,8​​]'和一個權重陣列'w = [0.1,0.2,0.3,0.4]。「我將創建一個帶有2個分箱的直方圖與'numpy.histogram(x,權重= w,bins = [0,5,10])''。在第0個倉中,由於權重,我將得到2和4,但是倉的總高度爲0.1 + 0.3 = 0.4。在第一個bin中,我將得到9和8的箱高度爲0.2 + 0.4 = 0.6。我也想得到每個bin_的權重平方和。對於第0個倉,這將是.1^2 + .3^2。該箱高度的統計誤差將是sqrt(sum(.1^2 + .3^2))= 0.316 ...不是sqrt(箱高),與未加權數據一樣。 – ddavis

回答

2

正如Alex建議,numpy.digitize是你想要的。該函數返回你的x數組所屬的條目所屬的bin。然後,您可以使用這些信息來訪問的w正確的元素:

x = np.array([2,9,4,8]) 
w = np.array([0.1,0.2,0.3,0.4]) 

bins = np.digitize(x, [0,5,10]) 

# access elements for first bin 
first_bin_ws = w[np.where(bins==1)[0]] 

# error of fist bin 
error = np.sqrt(np.sum(first_bin_ws**2.)) 

最後一行則計算誤差爲第一格。介意np.digitize從1開始

+0

完美 - 與'np.where'的例子是我需要看到它點擊。 – ddavis

1

計數如果我可以添加一個補充@ obachtos的回答,我已經將它擴展成則演示了完整的直方圖功能:

def hist_bin_uncertainty(data, weights, bin_edges): 
    """ 
    The statistical uncertainity per bin of the binned data. 
    If there are weights then the uncertainity will be the root of the 
    sum of the weights squared. 
    If there are no weights (weights = 1) this reduces to the root of 
    the number of events. 

    Args: 
     data: `array`, the data being histogrammed. 
     weights: `array`, the associated weights of the `data`. 
     bin_edges: `array`, the edges of the bins of the histogram. 

    Returns: 
     bin_uncertainties: `array`, the statistical uncertainity on the bins. 

    Example: 
    >>> x = np.array([2,9,4,8]) 
    >>> w = np.array([0.1,0.2,0.3,0.4]) 
    >>> edges = [0,5,10] 
    >>> hist_bin_uncertainty(x, w, edges) 
    array([ 0.31622777, 0.4472136 ]) 
    >>> hist_bin_uncertainty(x, None, edges) 
    array([ 1.41421356, 1.41421356]) 
    >>> hist_bin_uncertainty(x, np.ones(len(x)), edges) 
    array([ 1.41421356, 1.41421356]) 
    """ 
    import numpy as np 
    # Bound the data and weights to be within the bin edges 
    in_range_index = [idx for idx in range(len(data)) 
         if data[idx] > min(bin_edges) and data[idx] < max(bin_edges)] 
    in_range_data = np.asarray([data[idx] for idx in in_range_index]) 

    if weights is None or np.array_equal(weights, np.ones(len(weights))): 
     # Default to weights of 1 and thus uncertainty = sqrt(N) 
     in_range_weights = np.ones(len(in_range_data)) 
    else: 
     in_range_weights = np.asarray([weights[idx] for idx in in_range_index]) 

    # Bin the weights with the same binning as the data 
    bin_index = np.digitize(in_range_data, bin_edges) 
    # N.B.: range(1, bin_edges.size) is used instead of set(bin_index) as if 
    # there is a gap in the data such that a bin is skipped no index would appear 
    # for it in the set 
    binned_weights = np.asarray(
     [in_range_weights[np.where(bin_index == idx)[0]] for idx in range(1, len(bin_edges))]) 
    bin_uncertainties = np.asarray(
     [np.sqrt(np.sum(np.square(w))) for w in binned_weights]) 
    return bin_uncertainties