意味着直到25百分含楠

我有一個二維數組x上的陣列的行，用具有不同數量的NaN值的每一行：意味着直到25百分含楠

array([[ nan, -0.355, -0.036, ..., nan, nan], 
     [ nan, -0.341, -0.047, ..., nan, 0.654], 
     [ .016, -1.147, -0.667, ..., nan, nan], 
     ..., 
     [ nan, 0.294, -0.235, ..., 0.65, nan]])

鑑於這種陣列中，對於每一行，我想計算前25個百分點內所有值的平均值。我做了以下內容：

limit = np.nanpercentile(x, 25, axis=1) # output 1D array 
ans = np.nanmean(x * (x < limit[:,None]), axis=1)

但是，這是給錯誤的結果 - 特別是計數（np.nansum/np.nanmean）保持不變，不管我選擇什麼樣的百分因爲比較產生零它不是真實的，並且被計數爲平均值的有效值。我不能簡單地使用x[x>limit[:,None]]，因爲這給了一維數組，我需要一個2D結果。

f = x.copy() 
f[f > limit[:,None]] = np.nan 
ans = np.nanmean(f, axis=1)

是否有這更好的辦法：

我按照解決它？

來源

2016-11-21 dayum

你的意思是'極限'，你寫'低'的地方？如果是這樣，那麼我認爲這正是我所採用的程序。你在尋找什麼樣的更好的方法？ – Praveen

謝謝，是的，這是限制。最好想要一個沒有太多中間步驟的方法，因爲有了這個，每次我想要另一個百分點時，我都需要創建一個副本。 – dayum

方法＃1：您可以創建一個無效的掩碼，這將是原始數組中的NaNs和來自f > limit[:,None]的掩碼。然後，使用此掩碼執行np.nanmean等效方法，僅考慮masking的有效方法。使用masks/boolean arrays的好處將取決於內存，因爲它將比浮點pt陣列佔用的內存少8倍。因此，我們必須像這樣的實現 -

# Create mask of non-NaNs and thresholded ones 
mask = ~np.isnan(x) & (x <= limit[:,None]) 

# Get the row, col indices. Use the row indices for bin-based summing and 
# finally averaging by using those indices to get the group lengths. 
r,c = np.where(mask) 
out = np.bincount(r,x[mask])/np.bincount(r)

方法2：我們也可以使用np.add.reduceat這是有幫助這裏的垃圾箱已經排序按屏蔽。所以，有些更有效的將是像這樣 -

# Get the valid mask as before 
mask = ~np.isnan(x) & (x <= limit[:,None]) 

# Get valid row count. Use np.add.reduceat to perform grouped summations 
# at intervals separated by row indices. 
rowc = mask.sum(1) 
out = np.add.reduceat(x[mask],np.append(0,rowc[:-1].cumsum()))/rowc

標杆

功能defintions -

def original_app(x, limit): 
    f = x.copy() 
    f[f > limit[:,None]] = np.nan 
    ans = np.nanmean(f, axis=1) 
    return ans 

def proposed1_app(x, limit): 
    mask = ~np.isnan(x) & (x <= limit[:,None]) 
    r,c = np.where(mask) 
    out = np.bincount(r,x[mask])/np.bincount(r) 
    return out 

def proposed2_app(x, limit): 
    mask = ~np.isnan(x) & (x <= limit[:,None]) 
    rowc = mask.sum(1) 
    out = np.add.reduceat(x[mask],np.append(0,rowc[:-1].cumsum()))/rowc 
    return out

時序和驗證 -

In [402]: # Setup inputs 
    ...: x = np.random.randn(400,500) 
    ...: x.ravel()[np.random.randint(0,x.size,x.size//4)] = np.nan # Half as NaNs 
    ...: limit = np.nanpercentile(x, 25, axis=1) 
    ...: 

In [403]: np.allclose(original_app(x, limit),proposed1_app(x, limit)) 
Out[403]: True 

In [404]: np.allclose(original_app(x, limit),proposed2_app(x, limit)) 
Out[404]: True 

In [405]: %timeit original_app(x, limit) 
100 loops, best of 3: 5 ms per loop 

In [406]: %timeit proposed1_app(x, limit) 
100 loops, best of 3: 4.02 ms per loop 

In [407]: %timeit proposed2_app(x, limit) 
100 loops, best of 3: 2.18 ms per loop

來源

2016-11-21 10:11:22 Divakar

意味着直到25百分含楠

回答

相關問題