2017-04-12 57 views
0

我想在大稀疏矩陣(目前12000條12000)運營的最大值。 我想要做的就是它的塊設置爲零,但保持該塊內的最大值。 我已經有密集矩陣正在運行的解決方案:SciPy的/ numpy的:只保留一個稀疏矩陣塊

import numpy as np 
from scipy.sparse import random 

np.set_printoptions(precision=2) 
#x = random(10,10,density=0.5) 
x = np.random.random((10,10)) 
x = x.T * x 
print(x) 

def keep_only_max(a,b,c,d): 
    sub = x[a:b,c:d] 
    z = np.max(sub) 
    sub[sub < z] = 0 


sizes = np.asarray([0,1,5,4]) 
sizes_sum = np.cumsum(sizes) 

for i in range(1,len(sizes)): 
    current_i_min = sizes_sum[i-1] 
    current_i_max = sizes_sum[i] 
    for j in range(1,len(sizes)): 
    if i >= j: 
     continue 
    current_j_min = sizes_sum[j-1] 
    current_j_max = sizes_sum[j] 

    keep_only_max(current_i_min, current_i_max, current_j_min, current_j_max) 
    keep_only_max(current_j_min, current_j_max, current_i_min, current_i_max) 

print(x) 

然而,這不適用於稀疏矩陣(重試取消註釋在上面的線)。 任何想法我怎麼能有效地實現這一點沒有調用todense()?

回答

0

感謝hpaulj我設法用scipy.sparse.bmat實現解決方案:

from scipy.sparse import coo_matrix 
from scipy.sparse import csr_matrix 
from scipy.sparse import rand 
from scipy.sparse import bmat 
import numpy as np 


np.set_printoptions(precision=2) 

# my matrices are symmetric, so generate random symmetric matrix 
x = rand(10,10,density=0.4) 
x = x.T * x 
x = x 


def keep_only_max(a,b,c,d): 
    sub = x[a:b,c:d] 
    z = np.unravel_index(sub.argmax(),sub.shape) 
    i1 = z[0] 
    j1 = z[1] 
    new = csr_matrix(([sub[i1,j1]],([i1],[j1])),shape=(b-a,d-c)) 
    return new 

def keep_all(a,b,c,d): 
    return x[a:b,c:d].copy() 

# we want to create a chessboard pattern where the first central block is 1x1, the second 5x5 and the last 4x4 
sizes = np.asarray([0,1,5,4]) 
sizes_sum = np.cumsum(sizes) 

# acquire 2D array to store our chessboard blocks 
r = range(len(sizes)-1) 
blocks = [[0 for x in r] for y in r] 


for i in range(1,len(sizes)): 
    current_i_min = sizes_sum[i-1] 
    current_i_max = sizes_sum[i] 

    for j in range(i,len(sizes)): 

     current_j_min = sizes_sum[j-1] 
     current_j_max = sizes_sum[j] 

     if i == j: 
      # keep the blocks at the diagonal completely 
      sub = keep_all(current_i_min, current_i_max, current_j_min, current_j_max) 
      blocks[i-1][j-1] = sub 
     else: 
      # the blocks not on the digonal only keep their maximum value 
      current_j_min = sizes_sum[j-1] 
      current_j_max = sizes_sum[j] 

      # we can leverage the matrix symmetry and only calculate one new matrix. 
      m1 = keep_only_max(current_i_min, current_i_max, current_j_min, current_j_max) 
      m2 = m1.T 

      blocks[i-1][j-1] = m1 
      blocks[j-1][i-1] = m2 


z = bmat(blocks) 
print(z.todense()) 
0
def keep_only_max(a,b,c,d): 
    sub = x[a:b,c:d] 
    z = np.max(sub) 
    sub[sub < z] = 0 

對於稀疏x,該sub切片工程csr格式。這不會是爲等效密片一樣快,但它會創建的x那部分的副本。

我不得不檢查稀疏max功能。但我可以想象將sub轉換爲coo格式,使用np.argmax上的.data屬性,以及相應的rowcol值,構造一個具有相同形狀但只有一個非零值的新矩陣。

如果你的塊以規則,不重疊的方式覆蓋x,我建議用sparse.bmat構建一個新的矩陣。基本上收集所有組件的coo屬性,將它們加入具有適當偏移量的一組陣列中,並製作新的矩陣。

如果塊散在或重疊,您可能需要生成,並通過一個插入它們放回x之一。 csr格式應該適用於此,但會發出稀疏效率警告。 lil應該更快地改變值。我認爲它會接受塊。

我可以想象用稀疏矩陣做這件事,但它需要時間來設置測試用例和調試過程。

+0

塊確實是不重疊的,因爲我或多或少產生棋盤圖案,其中每個細胞只保留它的最大價值。 – Zahlii