回答

7

一些線性代數背景
奇異值分解(SVD)是任何基質W分解爲三個矩陣:

W = U S V* 

UV均爲鄰位正常矩陣,S是對角線上的元素在對角線上遞減。 一個SVD有趣的特性是,它允許以較低秩矩陣容易近似W:假設您截斷S只具有其k主導元素(而不是在對角線上的所有元素),那麼

W_app = W S_trunc V* 

W的近似值k

使用SVD近似完全連接層
假設我們有一個模型deploy_full.prototxt具有完全連接層

# ... some layers here 
layer { 
    name: "fc_orig" 
    type: "InnerProduct" 
    bottom: "in" 
    top: "out" 
    inner_product_param { 
    num_output: 1000 
    # more params... 
    } 
    # some more... 
} 
# more layers... 

此外,我們有trained_weights_full.caffemodel - 爲deploy_full.prototxt模型中訓練的參數。

  1. 複製deploy_full.protoxtdeploy_svd.protoxt和您選擇的編輯器打開它。 與這兩個層更換完全連接層:

    layer { 
        name: "fc_svd_U" 
        type: "InnerProduct" 
        bottom: "in" # same input 
        top: "svd_interim" 
        inner_product_param { 
        num_output: 20 # approximate with k = 20 rank matrix 
        bias_term: false 
        # more params... 
        } 
        # some more... 
    } 
    # NO activation layer here! 
    layer { 
        name: "fc_svd_V" 
        type: "InnerProduct" 
        bottom: "svd_interim" 
        top: "out" # same output 
        inner_product_param { 
        num_output: 1000 # original number of outputs 
        # more params... 
        } 
        # some more... 
    } 
    
  2. 在Python中,小net surgery

    import caffe 
    import numpy as np 
    
    orig_net = caffe.Net('deploy_full.prototxt', 'trained_weights_full.caffemodel', caffe.TEST) 
    svd_net = caffe.Net('deploy_svd.prototxt', 'trained_weights_full.caffemodel', caffe.TEST) 
    # get the original weight matrix 
    W = np.array(orig_net.params['fc_orig'][0].data) 
    # SVD decomposition 
    k = 20 # same as num_ouput of fc_svd_U 
    U, s, V = np.linalg.svd(W) 
    S = np.zeros((U.shape[0], k), dtype='f4') 
    S[:k,:k] = s[:k] # taking only leading k singular values 
    # assign weight to svd net 
    svd_net.params['fc_svd_U'][0].data[...] = np.dot(U,S) 
    svd_net.params['fc_svd_V'][0].data[...] = V[:k,:] 
    svd_net.params['fc_svd_V'][1].data[...] = orig_net.params['fc_orig'][1].data # same bias 
    # save the new weights 
    svd_net.save('trained_weights_svd.caffemodel') 
    

現在我們有deploy_svd.prototxttrained_weights_svd.caffemodel與遠近似原始網更少的乘法和權重。

+2

一個很好的解決方案! – Dale

+1

驚人的解決方案:) –

+1

@戴爾不是我的解決方案 - 這是羅斯Girshick的。 – Shai