漸變是零

我嘗試學習一個網絡，但始終得到零點漸變。我真的很困惑，我沒有任何想法。漸變是零

我有格式輸入數據（的batch_size，120，10，3）和後六層（CONV1 - POOL1 - CONV2 - POOL2 -fc1 - FC2）我期望大小爲1x1的輸出（0或1）。所有這一切真的很好。

但是，當我嘗試學習網絡時，我遇到了困難。我總是得到一個零梯度。我做錯了什麼？

import tensorflow as tf 
import data_collection as dc 

INPUT_HEIGHT = 120 
INPUT_WIDTH = 10 
INPUT_DEPTH = 3 

KERNEL_HEIGHT = 5 
KERNEL_WIDTH = 5 
KERNEL_1_IN_CHANNEL = 3 
KERNEL_1_OUT_CHANNEL = 32 
KERNEL_2_OUT_CHANNEL = 64 

FULLY_CONNECTED_1_OUTPUTS = 1024 
FULLY_CONNECTED_2_OUTPUTS = 1 


def weight_variable(shape): 
    initial = tf.truncated_normal(shape, stddev=0.1) 
    return tf.Variable(initial) 


def bias_variable(shape): 
    initial = tf.constant(0.1, shape=shape) 
    return tf.Variable(initial) 


def conv2d(x, W): 
    return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME') 


def max_pool_2x2(x): 
    return tf.nn.max_pool(x, ksize=[1, 2, 2, 1], 
          strides=[1, 2, 2, 1], padding='SAME') 


def max_pool_2x1(x): 
    return tf.nn.max_pool(x, ksize=[1, 2, 1, 1], 
          strides=[1, 2, 1, 1], padding='SAME') 


if __name__ == '__main__': 

    # Placeholder 
    x = tf.placeholder(tf.float32, [None, INPUT_HEIGHT, INPUT_WIDTH, INPUT_DEPTH]) 
    y_ = tf.placeholder(tf.float32, [None, 1]) 

    # First layer - convolution 
    W_conv1 = weight_variable([KERNEL_HEIGHT, KERNEL_WIDTH, KERNEL_1_IN_CHANNEL, KERNEL_1_OUT_CHANNEL]) 
    b_conv1 = bias_variable([KERNEL_1_OUT_CHANNEL]) 
    h_conv1 = tf.nn.relu(conv2d(x, W_conv1) + b_conv1) 

    # Second layer - 2x2 pooling 
    h_pool1 = max_pool_2x2(h_conv1) 

    # Third layer - convolution 
    W_conv2 = weight_variable([KERNEL_HEIGHT, KERNEL_WIDTH, KERNEL_1_OUT_CHANNEL, KERNEL_2_OUT_CHANNEL]) 
    b_conv2 = bias_variable([KERNEL_2_OUT_CHANNEL]) 
    h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2) 

    # Fourth layer - 2x1 pooling 
    h_pool2 = max_pool_2x1(h_conv2) 

    # Fifth layer - fully connected layer (30*5*64) -> (1024) 
    W_fc1 = weight_variable([30 * 5 * KERNEL_2_OUT_CHANNEL, FULLY_CONNECTED_1_OUTPUTS]) 
    b_fc1 = bias_variable([FULLY_CONNECTED_1_OUTPUTS]) 
    h_pool2_flat = tf.reshape(h_pool2, [-1, 30 * 5 * 64]) 
    h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1) 

    # Sixth layer - fully connected layer (1024) -> (1) 
    W_fc2 = weight_variable([FULLY_CONNECTED_1_OUTPUTS, FULLY_CONNECTED_2_OUTPUTS]) 
    b_fc2 = bias_variable([FULLY_CONNECTED_2_OUTPUTS]) 
    y_conv = tf.nn.sigmoid(tf.matmul(h_fc1, W_fc2) + b_fc2) 

    # Training 
    cross_entropy = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(targets=y_, logits=y_conv)) 
    optimizer = tf.train.GradientDescentOptimizer(1e-8) 
    gvs = optimizer.compute_gradients(cross_entropy) 
    train_step = optimizer.apply_gradients(gvs) 

    correct_prediction = tf.equal(tf.round(y_conv), y_) 
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32)) 

    init = tf.initialize_all_variables() 

    sess = tf.Session() 
    sess.run(init) 

    for i in range(200): 
     batch_xs, batch_ys = dc.get_train_data(), dc.get_train_labels() 
     if i % 100 == 0: 
      train_accuracy = accuracy.eval(session=sess, feed_dict={x: batch_xs, y_: batch_ys}) 
      print("step %d, training accuracy %.3f" % (i, train_accuracy)) 
      print("Y_conv_train is " + str(
       sess.run(tf.matmul(h_fc1, W_fc2) + b_fc2, feed_dict={x: batch_xs, y_: batch_ys}))) 

      test_accuracy = accuracy.eval(session=sess, feed_dict={x: dc.get_test_data(), y_: dc.get_test_labels()}) 
      print("step %d, test accuracy %.3f" % (i, test_accuracy)) 
      print("Y_conv_test is " + str(sess.run(tf.matmul(h_fc1, W_fc2) + b_fc2, feed_dict={x: dc.get_test_data(), 
                           y_: dc.get_test_labels()}))) 

     sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})

因此，我所有的時間都有相同的輸出。

step 0, training accuracy 0.500 
Y_conv_train is [[ -35.52193451] 
[-252.8659668 ]] 

step 0, test accuracy 0.000 
Y_conv_test is [[ 139.66842651]] 

step 100, training accuracy 0.500 
Y_conv_train is [[ -35.52193451] 
[-252.8659668 ]] 

step 100, test accuracy 0.000 
Y_conv_test is [[ 139.66842651]]

UPDATE！ 問題已解決。我忘記了標準化的數據。

來源

2017-03-05 Vladimir

你的學習速度真的很小，考慮增加到0.01，然後隨着時間的推移減少它。

來源

2017-03-05 21:21:17 Steven

我試圖改變學習速度，但它沒有幫助我。我仍然得到零梯度。 – Vladimir

這是什麼輸出？ var_grad = tf.gradients（cross_entropy，[W_fc2]）[0]然後你sess.run（var_grad）這會告訴你該變量的漸變。 – Steven

我調試它，梯度爲零。我特別沒有粘貼漸變變量的輸出列表，因爲它們的尺寸非常大。此輸出顯示沒有sigmoid函數的y_conv。如果權重將被更新，y_conv也會更新。但是，它不會發生。 – Vladimir

回答

相關問題