2017-07-14 132 views
2

請參考這個帖子就知道了問題的背景: Does the TensorFlow embedding_attention_seq2seq method implement a bidirectional RNN Encoder by default?如何修改Tensorflow Sequence2Sequence模型以實現雙向LSTM而不是單向模式?

我在同一個模型工作,並希望與雙向層,以取代單向LSTM層。我意識到我必須使用static_bidirectional_rnn而不是static_rnn,但由於張量形狀中的某些不匹配,我得到一個錯誤。

我更換了以下行:

encoder_outputs, encoder_state = core_rnn.static_rnn(encoder_cell, encoder_inputs, dtype=dtype) 

與下面的一行:

encoder_outputs, encoder_state_fw, encoder_state_bw = core_rnn.static_bidirectional_rnn(encoder_cell, encoder_cell, encoder_inputs, dtype=dtype) 

這給了我以下錯誤:

InvalidArgumentError (see above for traceback): Incompatible shapes: [32,5,1,256] vs. [16,1,1,256] [[Node: gradients/model_with_buckets/embedding_attention_seq2seq/embedding_attention_decoder/attention_decoder/Attention_0/add_grad/BroadcastGradientArgs = BroadcastGradientArgs[T=DT_INT32, _device="/job:localhost/replica:0/task:0/cpu:0"](gradients/model_with_buckets/embedding_attention_seq2seq/embedding_attention_decoder/attention_decoder/Attention_0/add_grad/Shape, gradients/model_with_buckets/embedding_attention_seq2seq/embedding_attention_decoder/attention_decoder/Attention_0/add_grad/Shape_1)]]

據我瞭解,雙方的輸出方法是不同的,但我不知道如何修改注意代碼來合併它。如何將向前和向後狀態都發送到關注模塊 - 我是否將兩個隱藏狀態連接起來?

回答

1

我從兩個張量的批量大小的地方不匹配,錯誤信息發現,一個是32,另一個是16,我想這是因爲雙向RNN的輸出列表是雙大小的那個的單向的。相應地,您只是不會在以下代碼中對其進行調整。

How do I send both the forward and backward states to the attention module- do I concatenate both the hidden states?

您可以參考下面的代碼:

def _reduce_states(self, fw_st, bw_st): 
    """Add to the graph a linear layer to reduce the encoder's final FW and BW state into a single initial state for the decoder. This is needed because the encoder is bidirectional but the decoder is not. 
    Args: 
     fw_st: LSTMStateTuple with hidden_dim units. 
     bw_st: LSTMStateTuple with hidden_dim units. 
    Returns: 
     state: LSTMStateTuple with hidden_dim units. 
    """ 
    hidden_dim = self._hps.hidden_dim 
    with tf.variable_scope('reduce_final_st'): 

     # Define weights and biases to reduce the cell and reduce the state 
     w_reduce_c = tf.get_variable('w_reduce_c', [hidden_dim * 2, hidden_dim], dtype=tf.float32, initializer=self.trunc_norm_init) 
     w_reduce_h = tf.get_variable('w_reduce_h', [hidden_dim * 2, hidden_dim], dtype=tf.float32, initializer=self.trunc_norm_init) 
     bias_reduce_c = tf.get_variable('bias_reduce_c', [hidden_dim], dtype=tf.float32, initializer=self.trunc_norm_init) 
     bias_reduce_h = tf.get_variable('bias_reduce_h', [hidden_dim], dtype=tf.float32, initializer=self.trunc_norm_init) 

     # Apply linear layer 
     old_c = tf.concat(axis=1, values=[fw_st.c, bw_st.c]) # Concatenation of fw and bw cell 
     old_h = tf.concat(axis=1, values=[fw_st.h, bw_st.h]) # Concatenation of fw and bw state 
     new_c = tf.nn.relu(tf.matmul(old_c, w_reduce_c) + bias_reduce_c) # Get new cell from old cell 
     new_h = tf.nn.relu(tf.matmul(old_h, w_reduce_h) + bias_reduce_h) # Get new state from old state 
return tf.contrib.rnn.LSTMStateTuple(new_c, new_h) # Return new cell and state 
+0

這似乎正是我一直在尋找。讓我試試它,並更新如果這個工程。謝謝。 –

+0

這似乎工作,但我有一個問題: 爲什麼我不能簡單地加倍解碼單元的大小,而不是將編碼單元狀態投影到一半大小?我看到這會減少模型中的參數數量,但是由於我正在做的預測,我不會丟失信息嗎? –

+0

@LeenaShekhar將解碼單元尺寸加倍也是實用的。在這裏,最好將雙向編碼器的兩種狀態合併爲一個(使編碼器和解碼器具有相同的單元尺寸以防止出錯),這是通過對c和c分別執行上述投影完成的。 H。 – lerner