我試圖通過解決這個挑戰來讓我的手溼潤Tensorflow:https://www.kaggle.com/c/integer-sequence-learning。Tensorflow損失在我的RNN中發散
我的工作是基於這些博客文章:
- https://danijar.com/variable-sequence-lengths-in-tensorflow/
- https://gist.github.com/evanthebouncy/8e16148687e807a46e3f
一個完整的工作的例子 - 我的數據 - 可以在這裏找到:https://github.com/bottiger/Integer-Sequence-Learning運行示例將打印出了很多調試信息。運行execute rnn-lstm-my.py。 (需要張量流和熊貓)
該方法非常簡單。我加載了所有的火車序列,將它們的長度存儲在一個向量中,並將最長的長度存儲在一個變量中,我稱之爲''max_length''。
在我的訓練數據I剝離出的最後一個元素中的所有的序列,並將其存儲在一個所謂的載體「train_solutions」
的I存儲所有的序列,用零填充,在具有形狀的矩陣: [n_seq,max_length]。
由於我想預測序列中的下一個數字,我的輸出應該是單個數字,我的輸入應該是一個序列。
我使用一個RNN(tf.nn.rnn)與一個BasicLSTMCell作爲單元格,帶有24個隱藏單元。輸出被輸入一個基本的線性模型(xW + B),這應該能夠產生我的預測。
我的成本函數就是我的模型的預測數,我計算成本是這樣的:
cost = tf.nn.l2_loss(tf_result - prediction)
的基本尺寸似乎是正確的,因爲代碼實際運行。然而,經過一次或兩次迭代後,一些NaN開始出現並迅速傳播,一切都變成了NaN。
這裏是定義和運行圖形的代碼的重要部分。但是,我已經忽略了加載/準備數據的過程。請看git回購的細節 - 但我很確定這部分是正確的。
cell = tf.nn.rnn_cell.BasicLSTMCell(num_hidden, state_is_tuple=True)
num_inputs = tf.placeholder(tf.int32, name='NumInputs')
seq_length = tf.placeholder(tf.int32, shape=[batch_size], name='NumInputs')
# Define the input as a list (num elements = batch_size) of sequences
inputs = [tf.placeholder(tf.float32,shape=[1, max_length], name='InputData') for _ in range(batch_size)]
# Result should be 1xbatch_szie vector
result = tf.placeholder(tf.float32, shape=[batch_size, 1], name='OutputData')
tf_seq_length = tf.Print(seq_length, [seq_length, seq_length.get_shape()], 'SequenceLength: ')
outputs, states = tf.nn.rnn(cell, inputs, dtype=tf.float32)
# Print the output. The NaN first shows up here
outputs2 = tf.Print(outputs, [outputs], 'Last: ', name="Last", summarize=800)
# Define the model
tf_weight = tf.Variable(tf.truncated_normal([batch_size, num_hidden, frame_size]), name='Weight')
tf_bias = tf.Variable(tf.constant(0.1, shape=[batch_size]), name='Bias')
# Debug the model parameters
weight = tf.Print(tf_weight, [tf_weight, tf_weight.get_shape()], "Weight: ")
bias = tf.Print(tf_bias, [tf_bias, tf_bias.get_shape()], "bias: ")
# More debug info
print('bias: ', bias.get_shape())
print('weight: ', weight.get_shape())
print('targets ', result.get_shape())
print('RNN input ', type(inputs))
print('RNN input len()', len(inputs))
print('RNN input[0] ', inputs[0].get_shape())
# Calculate the prediction
tf_prediction = tf.batch_matmul(outputs2, weight) + bias
prediction = tf.Print(tf_prediction, [tf_prediction, tf_prediction.get_shape()], 'prediction: ')
tf_result = result
# Calculate the cost
cost = tf.nn.l2_loss(tf_result - prediction)
#optimizer = tf.train.AdamOptimizer()
learning_rate = 0.05
optimizer = tf.train.GradientDescentOptimizer(learning_rate)
minimize = optimizer.minimize(cost)
mistakes = tf.not_equal(tf.argmax(result, 1), tf.argmax(prediction, 1))
error = tf.reduce_mean(tf.cast(mistakes, tf.float32))
init_op = tf.initialize_all_variables()
sess = tf.Session()
sess.run(init_op)
no_of_batches = int(len(train_input))/batch_size
epoch = 1
val_dict = get_input_dict(val_input, val_output, train_length, inputs, batch_size)
for i in range(epoch):
ptr = 0
for j in range(no_of_batches):
print('eval w: ', weight.eval(session=sess))
# inputs batch
t_i = train_input[ptr:ptr+batch_size]
# output batch
t_o = train_output[ptr:ptr+batch_size]
# sequence lengths
t_l = train_length[ptr:ptr+batch_size]
sess.run(minimize,feed_dict=get_input_dict(t_i, t_o, t_l, inputs, batch_size))
ptr += batch_size
print("result: ", tf_result)
print("result len: ", tf_result.get_shape())
print("prediction: ", prediction)
print("prediction len: ", prediction.get_shape())
c_val = sess.run(error, feed_dict = val_dict)
print "Validation cost: {}, on Epoch {}".format(c_val,i)
print "Epoch ",str(i)
print('test input: ', type(test_input))
print('test output: ', type(test_output))
incorrect = sess.run(error,get_input_dict(test_input, test_output, test_length, inputs, batch_size))
sess.close()
這裏是(它的第一行)它產生的輸出。你可以看到這一切成爲楠:http://pastebin.com/TnFFNFrr(我不能張貼在這裏,由於身體極限)
我第一次看到了陳楠在這裏:
我tensorflow /核心/粒/ logging_ops .cc:79] Last:[0 0.76159418 0 0 0 0 0 -0.76159418 0 -0.715159418 0 0 0 0.76159418 0.76159418 0 -0.76159418 0.76159418 0 0 0 0.76159418 0 0 0 nan nan nan nan 0 0 nan nan 1 0 nan 0 0.76159418 nan nan nan 1 0 nan 0 0。76159418在 -in -in -in -in -in -in -in -in -in -in -in -in -in -in -in -in -in -in -in -in -in -in -in -in在在在在在在在在在在在在在在在在在在在在在 -in -in -in -in -in -in -in -in -in -in -in - -in的 -in -in -in -in -in -in -in -in -in -in的 的的的的的的的的的-in在-in -in -in -in -in -in -in -in -in -in -in -in -in -in -in -in -in -in -in -in -in -in -in -in在在在在在在在在在在在 -in -in -in -in -in -in -in -in -in -in -in -in -in -in -in -in -in -in -in -in -in -in - -in的的的的的的的的的的的的的的的的的]
的,我希望我做了我的問題清楚。在此先感謝