如何使用截斷SVD來減少完全連接（`「InnerProduct」`）層

在論文Girshick, RFast-RCNN (ICCV 2015)的第3.1節「截斷SVD用於更快檢測」中，作者建議使用SVD技巧來減小尺寸和計算時間完全連接的層。如何使用截斷SVD來減少完全連接（`「InnerProduct」`）層

給出一個訓練有素模型（deploy.prototxt和weights.caffemodel），我該如何使用這一招用截短一個更換一個完全連接層？

來源

2016-11-08 Shai

一些線性代數背景
奇異值分解（SVD）是任何基質W分解爲三個矩陣：

W = U S V*

凡U和V均爲鄰位正常矩陣，S是對角線上的元素在對角線上遞減。一個SVD有趣的特性是，它允許以較低秩矩陣容易近似W：假設您截斷S只具有其k主導元素（而不是在對角線上的所有元素），那麼

W_app = W S_trunc V*

是W的近似值k。

使用SVD近似完全連接層
假設我們有一個模型deploy_full.prototxt具有完全連接層

# ... some layers here 
layer { 
    name: "fc_orig" 
    type: "InnerProduct" 
    bottom: "in" 
    top: "out" 
    inner_product_param { 
    num_output: 1000 
    # more params... 
    } 
    # some more... 
} 
# more layers...

此外，我們有trained_weights_full.caffemodel - 爲deploy_full.prototxt模型中訓練的參數。

複製deploy_full.protoxt到deploy_svd.protoxt和您選擇的編輯器打開它。 與這兩個層更換完全連接層：

layer { 
    name: "fc_svd_U" 
    type: "InnerProduct" 
    bottom: "in" # same input 
    top: "svd_interim" 
    inner_product_param { 
    num_output: 20 # approximate with k = 20 rank matrix 
    bias_term: false 
    # more params... 
    } 
    # some more... 
} 
# NO activation layer here! 
layer { 
    name: "fc_svd_V" 
    type: "InnerProduct" 
    bottom: "svd_interim" 
    top: "out" # same output 
    inner_product_param { 
    num_output: 1000 # original number of outputs 
    # more params... 
    } 
    # some more... 
}

在Python中，小net surgery：

import caffe 
import numpy as np 

orig_net = caffe.Net('deploy_full.prototxt', 'trained_weights_full.caffemodel', caffe.TEST) 
svd_net = caffe.Net('deploy_svd.prototxt', 'trained_weights_full.caffemodel', caffe.TEST) 
# get the original weight matrix 
W = np.array(orig_net.params['fc_orig'][0].data) 
# SVD decomposition 
k = 20 # same as num_ouput of fc_svd_U 
U, s, V = np.linalg.svd(W) 
S = np.zeros((U.shape[0], k), dtype='f4') 
S[:k,:k] = s[:k] # taking only leading k singular values 
# assign weight to svd net 
svd_net.params['fc_svd_U'][0].data[...] = np.dot(U,S) 
svd_net.params['fc_svd_V'][0].data[...] = V[:k,:] 
svd_net.params['fc_svd_V'][1].data[...] = orig_net.params['fc_orig'][1].data # same bias 
# save the new weights 
svd_net.save('trained_weights_svd.caffemodel')

現在我們有deploy_svd.prototxt與trained_weights_svd.caffemodel與遠近似原始網更少的乘法和權重。

來源

2016-11-08 08:02:19 Shai

一個很好的解決方案！ – Dale

驚人的解決方案:) –

@戴爾不是我的解決方案 - 這是羅斯Girshick的。 – Shai

實際上，Ross Girshick的py-faster-rcnn回購包含SVD步驟的實現：compress_net.py。您通常需要微調壓縮模型以恢復精度（或以更復雜的方式壓縮，例如參見Zhang等人的「Accelerating Very Deep Convolutional Networks for Classification and Detection」）。

此外，對我來說，scipy.linalg.svd比numpy的svd工作得更快。

來源

2017-08-24 09:13:43 rkellerm

很容易錯過這個，謝謝參考。 – 2017-11-27 07:54:16

如何使用截斷SVD來減少完全連接（`「InnerProduct」`）層

回答

相關問題