2016-11-22 64 views
1

所有,在Tensorflow中使用bucketing時如何在Adam優化器中共享梯度和變量?

我用了瓢潑大雨般的技術seq2seq任務:

# For different length in encoder and decoder 
model_map = {} 
for i in encoder_shape: 
    for j in decoder_shape: 
     with variable_scope.variable_scope(variable_scope.get_variable_scope(), 
           reuse=True if tt > 0 else None): 
      model = Seq2SeqModel() 
      model.build(encoder[:i], decoder[:j]) 
      model_map[i*100+j] = model 

與人分享模型的參數:

for t in tf.all_variables(): 
    print t.name, t.get_shape() 

Print: 
embedding_attention_seq2seq/RNN/EmbeddingWrapper/embedding:0 (50000, 256) 
embedding_attention_seq2seq/RNN/MultiRNNCell/Cell0/GRUCell/Gates/Linear/Matrix:0 (1056, 1600) 
embedding_attention_seq2seq/RNN/MultiRNNCell/Cell0/GRUCell/Gates/Linear/Bias:0 (1600,) 

模型的優化是象下面這樣:

#every model have an optimizer 
params = tf.trainable_variables() 
opt = tf.train.AdamOptimizer(1e-3) 
gradients = tf.gradients(self.loss, params) 
self.optimizer = opt.apply_gradients(zip(gradients, params)) 

但我發現優化器不共享變量:

embedding_attention_seq2seq/RNN/EmbeddingWrapper/embedding/Adam:0 (50000, 256) 
embedding_attention_seq2seq/RNN/EmbeddingWrapper/embedding/Adam_1:0 (50000, 256) 
embedding_attention_seq2seq/RNN/MultiRNNCell/Cell0/GRUCell/Gates/Linear/Matrix/Adam:0 (1056, 1600) 
embedding_attention_seq2seq/RNN/MultiRNNCell/Cell0/GRUCell/Gates/Linear/Matrix/Adam_1:0 (1056, 1600) 
embedding_attention_seq2seq/RNN/MultiRNNCell/Cell0/GRUCell/Gates/Linear/Bias/Adam:0 (1600,) 
embedding_attention_seq2seq/RNN/MultiRNNCell/Cell0/GRUCell/Gates/Linear/Bias/Adam_1:0 (1600,) 
embedding_attention_seq2seq/RNN/EmbeddingWrapper/embedding/Adam_2:0 (50000, 256) 
embedding_attention_seq2seq/RNN/EmbeddingWrapper/embedding/Adam_3:0 (50000, 256) 
embedding_attention_seq2seq/RNN/MultiRNNCell/Cell0/GRUCell/Gates/Linear/Matrix/Adam_2:0 (1056, 1600) 
embedding_attention_seq2seq/RNN/MultiRNNCell/Cell0/GRUCell/Gates/Linear/Matrix/Adam_3:0 (1056, 1600) 
embedding_attention_seq2seq/RNN/MultiRNNCell/Cell0/GRUCell/Gates/Linear/Bias/Adam_2:0 (1600,) 
embedding_attention_seq2seq/RNN/MultiRNNCell/Cell0/GRUCell/Gates/Linear/Bias/Adam_3:0 (1600,) 

隨着存儲桶數量的增加,GPU內存也將增長。與此同時,我在tf.train.Saver.save()中獲得了一個更大的模型。

那麼是否有可能在張量流分流中共享梯度?

+0

@Lukasz Kaiser請給予一些幫助。 – Issac

回答

0

我相信在所有模型中共享一個優化器實例會做你想要的。

相關問題