通過其對角線正常化sparse.csc_matrix

我有一個dtype = np.int32的scipy.sparse.csc_matrix。我想要有效地將矩陣中的每一列（或行，以csc_matrix更快爲準）除以該列中的對角線元素。所以mnew [：，i] = m [：，i]/m [i，i]。請注意，我需要將我的矩陣轉換爲np.double（因爲mnew元素將在[0,1]中），並且由於矩陣非常龐大而且非常稀疏，我想知道我是否可以在某些高效/否循環/永遠不會密集的方式。通過其對角線正常化sparse.csc_matrix

最佳，

伊利亞

來源

2017-07-26 Ilya

如果'm [i，i]'值爲0會怎麼樣？獲得對角線應該很容易，並且乘法也是有效的。給我們一個小例子，例如一個10x10矩陣，並用它的等價密度來演示。 – hpaulj

m [i，i]保證不爲零且大於行/列中的任何值。這是一個簡短（5-20項）列表（通過幾個100k這樣的列表計算）的項目的同時發生矩陣。矩陣大小是（numUniqueItems，numUniqueItems）。因此，對角元素表示特定項目出現的數量列表，對角元素表示第i項和第j項出現的列表數。對角元素沿着一列（或一行）分隔將是p（第j個項目出現|第i個項目出現） – Ilya

做一個稀疏矩陣：

In [379]: M = sparse.random(5,5,.2, format='csr') 
In [380]: M 
Out[380]: 
<5x5 sparse matrix of type '<class 'numpy.float64'>' 
    with 5 stored elements in Compressed Sparse Row format> 
In [381]: M.diagonal() 
Out[381]: array([ 0., 0., 0., 0., 0.])

太多0對角線 - 讓我們添加一個非零對角線：

In [382]: D=sparse.dia_matrix((np.random.rand(5),0),shape=(5,5)) 
In [383]: D 
Out[383]: 
<5x5 sparse matrix of type '<class 'numpy.float64'>' 
    with 5 stored elements (1 diagonals) in DIAgonal format> 
In [384]: M1 = M+D 


In [385]: M1 
Out[385]: 
<5x5 sparse matrix of type '<class 'numpy.float64'>' 
    with 10 stored elements in Compressed Sparse Row format> 

In [387]: M1.A 
Out[387]: 
array([[ 0.35786668, 0.81754484, 0.  , 0.  , 0.  ], 
     [ 0.  , 0.41928992, 0.  , 0.01371273, 0.  ], 
     [ 0.  , 0.  , 0.4685924 , 0.  , 0.35724102], 
     [ 0.  , 0.  , 0.77591294, 0.95008721, 0.16917791], 
     [ 0.  , 0.  , 0.  , 0.  , 0.16659141]])

現在是微不足道將每列按其對角線分開（這是一個矩陣'產品'）

In [388]: M1/M1.diagonal() 
Out[388]: 
matrix([[ 1.  , 1.94983185, 0.  , 0.  , 0.  ], 
     [ 0.  , 1.  , 0.  , 0.01443313, 0.  ], 
     [ 0.  , 0.  , 1.  , 0.  , 2.1444144 ], 
     [ 0.  , 0.  , 1.65583764, 1.  , 1.01552603], 
     [ 0.  , 0.  , 0.  , 0.  , 1.  ]])

或分割行 - （乘以一個列向量）

In [391]: M1/M1.diagonal()[:,None]

糟糕，這些是緻密的;讓我們使對角線稀疏

In [408]: md = sparse.csr_matrix(1/M1.diagonal()) # do the inverse here 
In [409]: md 
Out[409]: 
<1x5 sparse matrix of type '<class 'numpy.float64'>' 
    with 5 stored elements in Compressed Sparse Row format> 
In [410]: M.multiply(md) 
Out[410]: 
<5x5 sparse matrix of type '<class 'numpy.float64'>' 
    with 5 stored elements in Compressed Sparse Row format> 
In [411]: M.multiply(md).A 
Out[411]: 
array([[ 0.  , 1.94983185, 0.  , 0.  , 0.  ], 
     [ 0.  , 0.  , 0.  , 0.01443313, 0.  ], 
     [ 0.  , 0.  , 0.  , 0.  , 2.1444144 ], 
     [ 0.  , 0.  , 1.65583764, 0.  , 1.01552603], 
     [ 0.  , 0.  , 0.  , 0.  , 0.  ]])

md.multiply(M)爲列版本。

Division of sparse matrix - 類似，除了它使用行的總和而不是對角線。對潛在的「零分」問題進行更多的研究。

來源

2017-07-27 01:10:18 hpaulj

不幸的是，這引發了一個內存錯誤。特別是真正的劃分調用 np.true_divide（self.todense（），other）使矩陣密集（並因此不可能適合我的記憶） – Ilya

對不起，我沒有注意到我的分割返回'array'和'矩陣'。密集的乘法/除法會產生密集的。必須使對角線稀疏。 – hpaulj

你的編輯工作完美，做我所需要的，謝謝！ – Ilya

通過其對角線正常化sparse.csc_matrix

回答

相關問題