Theano - 如何覆蓋部分操作圖的梯度

我手邊有一個相當複雜的模型。該模型有線性結構中的多個部分：Theano - 如何覆蓋部分操作圖的梯度

y = theano.tensor.dot(W,x) + b

我想建立一個使用自定義的規則來計算所有線性結構梯度的優化，同時保持其他操作不變。 爲我的模型的所有線性部分覆蓋漸變操作符的最簡單方法是什麼？最好不需要寫一個新的作品。

來源

2016-11-15 Kh40tiK

所以，我花了一些時間在PR（~~不合併爲2017年1月13日~~已合併的）的Theano，這給部分地重寫theano.OpFromGraph實例的梯度用戶能力的工作。覆蓋是用符號圖完成的，所以你仍然可以獲得theano優化的全部好處。

典型使用案例：

數值安全考慮
重縮放/剪切梯度
專業梯度例行像黎曼自然梯度

若要使作品具有壓倒一切的梯度：

進行必要的計算圖表
做一個OpFromGraph實例（或Python函數）爲您的作品
做一個OFG比如你作品的梯度，並設置grad_overrides參數
呼叫OFG實例構建模型

定義一個OpFromGraph就像編譯theano功能，具有一定的差異：

爲updates和不支持（截至2017年1月的）
你得到一個象徵性的作品，而不是一個數值函數

例子：

''' 
This creates an atan2_safe Op with smoothed gradient at (0,0) 
''' 
import theano as th 
import theano.tensor as T 

# Turn this on if you want theano to build one large graph for your model instead of precompiling the small graph. 
USE_INLINE = False 
# In a real case you would set EPS to a much smaller value 
EPS = 0.01 

# define a graph for needed Op 
s_x, s_y = T.scalars('xy') 
s_darg = T.scalar(); # backpropagated gradient 
s_arg = T.arctan2(s_y, s_x) 
s_abs2 = T.sqr(s_x) + T.sqr(s_y) + EPS 
s_dx = -s_y/s_abs2 
s_dy = s_x/s_abs2 

# construct OfG with gradient overrides 
# NOTE: there are unused inputs in the gradient expression, 
#  however the input count must match, so we pass  
#  on_unused_input='ignore' 
atan2_safe_grad = th.OpFromGraph([s_x, s_y, s_darg], [s_dx, s_dy], inline=USE_INLINE, on_unused_input='ignore') 
atan2_safe = th.OpFromGraph([s_x, s_y], [s_arg], inline=USE_INLINE, grad_overrides=atan2_safe_grad) 

# build graph using the new Op 
x, y = T.scalar(), T.scalar() 
arg = atan2_safe(x, y) 
dx, dy = T.grad(arg, [x, y]) 
fn = th.function([x, y], [dx, dy]) 
fn(1., 0.) # gives [-0.0, 0.99099] 
fn(0., 0.) # gives [0.0, 0.0], no more annoying nan!

注：theano.OpFromGraph在很大程度上仍然是實驗性的，預期的錯誤。

來源

2017-01-13 04:47:09 Kh40tiK

Theano - 如何覆蓋部分操作圖的梯度

回答

相關問題