0
我在我的模型中使用了兩個堆碼dynamic_rnn
,這意味着第二個dynamic_rnn
的initial_state
是第一個dynamic_rnn
輸出的final_state
。我的損失函數僅基於第二個dynamic_rnn
的output
進行計算。我的問題是,梯度會傳回第一個dynamic_rnn
?將漸變向後傳播多個dynamic_rnn?
你可能會問我,爲什麼我冗長用兩個dynamic_rnn
而不是一個。答案是對於我的問題,除了最後一步之外,大多數輸入序列是完全相同的。所以,我只是爲所有的節省時間的目的,這些輸入序列的公共部分運行dynamic_rnn
一次喂final_state
另一個dynamic_rnn
它接受的不同和最後輸入元素。
假設我們有與長度10 3個序列的所有這些序列是除了最後步驟(第10個元素)相同。簡化代碼:
cell = BasicRNNCell()
# the first dynamic_rnn which handles the common part
first_outputs, first_states = tf.nn.dynamic_rnn(
cell=cell,
dtype=tf.float32,
sequence_length=[9], # only one sample with length 9
inputs=identical_input # input with shape (1, 9, input_element_dim)
)
# tile the first_states to accommodate next dynamic_rnn
# first_states is transformed from shape (1, hidden_state_dim) to (3, hidden_state_dim)
first_states = tf.reshape(tf.tile(first_states, [1, 3]), [3, hidden_state_dim])
# the second dynamic_rnn which handles the distinct last element
second_outputs, second_states = tf.nn.dynamic_rnn(
initial_state=first_states,
cell=cell,
dtype=tf.float32,
sequence_length=[1, 1, 1], # 3 samples with only one element
inputs=distinct_input # input with shape (3, 1, input_element_dim)
)
# calculate loss based on second_outputs
loss = some_loss_function(second_outputs, groud_truth)