2016-11-30 152 views
11

我曾經在tensorflow中嘗試過幾個batch_normalization版本,但都沒有工作!當我在推理時設置batch_size = 1時,結果都不正確。如何在張量流中正確使用批量標準化?

版本1:tensorflow.contrib直接使用官方版本

from tensorflow.contrib.layers.python.layers.layers import batch_norm 

使用這樣的:

output = lrelu(batch_norm(tf.nn.bias_add(conv, biases), is_training), 0.5, name=scope.name) 

is_training =在訓練時間真和推理時間假。

版本2:How could I use Batch Normalization in TensorFlow?

def batch_norm_layer(x, train_phase, scope_bn='bn'): 
    bn_train = batch_norm(x, decay=0.999, epsilon=1e-3, center=True, scale=True, 
      updates_collections=None, 
      is_training=True, 
      reuse=None, # is this right? 
      trainable=True, 
      scope=scope_bn) 
    bn_inference = batch_norm(x, decay=0.999, epsilon=1e-3, center=True, scale=True, 
      updates_collections=None, 
      is_training=False, 
      reuse=True, # is this right? 
      trainable=True, 
      scope=scope_bn) 
    z = tf.cond(train_phase, lambda: bn_train, lambda: bn_inference) 
    return z 

使用這樣的:

output = lrelu(batch_norm_layer(tf.nn.bias_add(conv, biases), is_training), 0.5, name=scope.name) 

is_training是在訓練時間的佔位符是真與假的推理時間。

版本3:從超薄https://github.com/tensorflow/models/blob/master/inception/inception/slim/ops.py

def batch_norm_layer(inputs, 
      is_training=True, 
      scope='bn'): 
    decay=0.999 
    epsilon=0.001 
    inputs_shape = inputs.get_shape() 
    with tf.variable_scope(scope) as t_scope: 
    axis = list(range(len(inputs_shape) - 1)) 
    params_shape = inputs_shape[-1:] 
    # Allocate parameters for the beta and gamma of the normalization. 
    beta, gamma = None, None 
    beta = tf.Variable(tf.zeros_initializer(params_shape), 
     name='beta', 
     trainable=True) 
    gamma = tf.Variable(tf.ones_initializer(params_shape), 
     name='gamma', 
     trainable=True) 
    moving_mean = tf.Variable(tf.zeros_initializer(params_shape), 
     name='moving_mean', 
     trainable=False) 
    moving_variance = tf.Variable(tf.ones_initializer(params_shape), 
     name='moving_variance', 
     trainable=False) 
    if is_training: 
     # Calculate the moments based on the individual batch. 
     mean, variance = tf.nn.moments(inputs, axis) 

     update_moving_mean = moving_averages.assign_moving_average(
      moving_mean, mean, decay) 
     update_moving_variance = moving_averages.assign_moving_average(
      moving_variance, variance, decay) 
    else: 
     # Just use the moving_mean and moving_variance. 
     mean = moving_mean 
     variance = moving_variance 
     # Normalize the activations. 
    outputs = tf.nn.batch_normalization(
     inputs, mean, variance, beta, gamma, epsilon) 
    outputs.set_shape(inputs.get_shape()) 
    return outputs 

使用這樣的:

output = lrelu(batch_norm_layer(tf.nn.bias_add(conv, biases), is_training), 0.5, name=scope.name) 

is_training =在訓練時間真和推理時間假。

版本4:與版本3,但增加tf.control_dependencies

def batch_norm_layer(inputs, 
      decay=0.999, 
      center=True, 
      scale=True, 
      epsilon=0.001, 
      moving_vars='moving_vars', 
      activation=None, 
      is_training=True, 
      trainable=True, 
      restore=True, 
      scope='bn', 
      reuse=None): 
    inputs_shape = inputs.get_shape() 
    with tf.variable_op_scope([inputs], scope, 'BatchNorm', reuse=reuse): 
     axis = list(range(len(inputs_shape) - 1)) 
     params_shape = inputs_shape[-1:] 
     # Allocate parameters for the beta and gamma of the normalization. 
     beta = tf.Variable(tf.zeros(params_shape), name='beta') 
     gamma = tf.Variable(tf.ones(params_shape), name='gamma') 
     # Create moving_mean and moving_variance add them to 
     # GraphKeys.MOVING_AVERAGE_VARIABLES collections. 
     moving_mean = tf.Variable(tf.zeros(params_shape), name='moving_mean', 
      trainable=False) 
     moving_variance = tf.Variable(tf.ones(params_shape), name='moving_variance', 
      trainable=False) 
    control_inputs = [] 
    if is_training: 
     # Calculate the moments based on the individual batch. 
     mean, variance = tf.nn.moments(inputs, axis) 

     update_moving_mean = moving_averages.assign_moving_average(
      moving_mean, mean, decay) 
     update_moving_variance = moving_averages.assign_moving_average(
      moving_variance, variance, decay) 
     control_inputs = [update_moving_mean, update_moving_variance] 
    else: 
     # Just use the moving_mean and moving_variance. 
     mean = moving_mean 
     variance = moving_variance 
    # Normalize the activations. 
    with tf.control_dependencies(control_inputs): 
     return tf.nn.batch_normalization(
     inputs, mean, variance, beta, gamma, epsilon) 

使用這樣的:

output = lrelu(batch_norm(tf.nn.bias_add(conv, biases), is_training), 0.5, name=scope.name) 

is_training =在訓練時間真和推理時間假。

Batch_normalization的4個版本都不正確。那麼,如何正確使用批量標準化?

另一個奇怪的現象是,如果我把batch_norm_layer設置爲null這樣,推理結果都是一樣的。

def batch_norm_layer(inputs, is_training): 
    return inputs 
+1

我在知道我在用的基本概念的堅定信仰者。我建議您閱讀關於批量規範化的文章,以真正理解爲什麼以及如何提供幫助:https://arxiv.org/pdf/1502.03167.pdf –

+0

如果您說「全部不正確」,那麼您的意思是什麼? – etarion

+1

這意味着「他們都錯了」。 – widgetxp

回答

6

我已經測試了以下簡化的實現批量標準化的給出了相同的結果tf.contrib.layers.batch_norm只要設置是相同的。

def initialize_batch_norm(scope, depth): 
    with tf.variable_scope(scope) as bnscope: 
     gamma = tf.get_variable("gamma", shape[-1], initializer=tf.constant_initializer(1.0)) 
     beta = tf.get_variable("beta", shape[-1], initializer=tf.constant_initializer(0.0)) 
     moving_avg = tf.get_variable("moving_avg", shape[-1], initializer=tf.constant_initializer(0.0), trainable=False) 
     moving_var = tf.get_variable("moving_var", shape[-1], initializer=tf.constant_initializer(1.0), trainable=False) 
     bnscope.reuse_variables() 


def BatchNorm_layer(x, scope, train, epsilon=0.001, decay=.99): 
    # Perform a batch normalization after a conv layer or a fc layer 
    # gamma: a scale factor 
    # beta: an offset 
    # epsilon: the variance epsilon - a small float number to avoid dividing by 0 
    with tf.variable_scope(scope, reuse=True): 
     with tf.variable_scope('BatchNorm', reuse=True) as bnscope: 
      gamma, beta = tf.get_variable("gamma"), tf.get_variable("beta") 
      moving_avg, moving_var = tf.get_variable("moving_avg"), tf.get_variable("moving_var") 
      shape = x.get_shape().as_list() 
      control_inputs = [] 
      if train: 
       avg, var = tf.nn.moments(x, range(len(shape)-1)) 
       update_moving_avg = moving_averages.assign_moving_average(moving_avg, avg, decay) 
       update_moving_var = moving_averages.assign_moving_average(moving_var, var, decay) 
       control_inputs = [update_moving_avg, update_moving_var] 
      else: 
       avg = moving_avg 
       var = moving_var 
      with tf.control_dependencies(control_inputs): 
       output = tf.nn.batch_normalization(x, avg, var, offset=beta, scale=gamma, variance_epsilon=epsilon) 
    return output 

使用正式實施批標準化的tf.contrib.layers.batch_norm主要的提示是:(1)設置爲is_training=True訓練時間和is_training=False進行驗證和測試時間; (2)設置updates_collections=None以確保moving_variancemoving_mean已更新到位; (3)注意範圍設置; (4)如果您的數據集很小或您的總培訓更新/步驟不是很大,則將decay設置爲比默認值(默認值爲0.999)更小的值(decay=0.9decay=0.99)。

+0

謝謝中宇光,除了第4項外,我得到了和你相同的結論。 – widgetxp

+2

我一直有問題'tf.contrib.layers.batch_norm'。當我接受訓練時,我的網絡會收斂,但當我測試網絡並設置'is_training = False'時,它會給我帶來無稽之談。然而,當'is_training = True'時的測試結果對我更有意義(即使與沒有batch_norm的網絡相比,準確度幾乎爲零)。任何想法?我在這裏問:[測試時流量batch_norm不能正常工作時測試] [http://stackoverflow.com/questions/42770757/tensorflow-batch-norm-does-not-work-properly-when-testing-is-training-false] (is_training = False)) – user3157047

+0

@Zhongyu Kuang你可以解釋更多關於updates_collections的內容。我們用tf.GraphKeys.UPDATE_OPS更新它們。以及如何在推理中使用它們。 –

2

我發現Zhongyu Kuang的代碼真的很有用,但我堅持如何動態地在列車和測試操作之間切換,即如何從python boolean is_training移動到tensorflow布爾placeholder is_training。我需要此功能才能在培訓期間在驗證集上測試網絡。

從他的代碼,開始和this啓發,我寫了下面的代碼:

def batch_norm(x, scope, is_training, epsilon=0.001, decay=0.99): 
    """ 
    Returns a batch normalization layer that automatically switch between train and test phases based on the 
    tensor is_training 

    Args: 
     x: input tensor 
     scope: scope name 
     is_training: boolean tensor or variable 
     epsilon: epsilon parameter - see batch_norm_layer 
     decay: epsilon parameter - see batch_norm_layer 

    Returns: 
     The correct batch normalization layer based on the value of is_training 
    """ 
    assert isinstance(is_training, (ops.Tensor, variables.Variable)) and is_training.dtype == tf.bool 

    return tf.cond(
     is_training, 
     lambda: batch_norm_layer(x=x, scope=scope, epsilon=epsilon, decay=decay, is_training=True, reuse=None), 
     lambda: batch_norm_layer(x=x, scope=scope, epsilon=epsilon, decay=decay, is_training=False, reuse=True), 
    ) 


def batch_norm_layer(x, scope, is_training, epsilon=0.001, decay=0.99, reuse=None): 
    """ 
    Performs a batch normalization layer 

    Args: 
     x: input tensor 
     scope: scope name 
     is_training: python boolean value 
     epsilon: the variance epsilon - a small float number to avoid dividing by 0 
     decay: the moving average decay 

    Returns: 
     The ops of a batch normalization layer 
    """ 
    with tf.variable_scope(scope, reuse=reuse): 
     shape = x.get_shape().as_list() 
     # gamma: a trainable scale factor 
     gamma = tf.get_variable("gamma", shape[-1], initializer=tf.constant_initializer(1.0), trainable=True) 
     # beta: a trainable shift value 
     beta = tf.get_variable("beta", shape[-1], initializer=tf.constant_initializer(0.0), trainable=True) 
     moving_avg = tf.get_variable("moving_avg", shape[-1], initializer=tf.constant_initializer(0.0), trainable=False) 
     moving_var = tf.get_variable("moving_var", shape[-1], initializer=tf.constant_initializer(1.0), trainable=False) 
     if is_training: 
      # tf.nn.moments == Calculate the mean and the variance of the tensor x 
      avg, var = tf.nn.moments(x, range(len(shape)-1)) 
      update_moving_avg = moving_averages.assign_moving_average(moving_avg, avg, decay) 
      update_moving_var = moving_averages.assign_moving_average(moving_var, var, decay) 
      control_inputs = [update_moving_avg, update_moving_var] 
     else: 
      avg = moving_avg 
      var = moving_var 
      control_inputs = [] 
     with tf.control_dependencies(control_inputs): 
      output = tf.nn.batch_normalization(x, avg, var, offset=beta, scale=gamma, variance_epsilon=epsilon) 

    return output 

然後我以這種方式使用batch_norm層:

fc1_weights = tf.Variable(...) 
fc1 = tf.matmul(x, fc1_weights) 
fc1 = batch_norm(fc1, 'fc1_bn', is_training=is_training) 
fc1 = tf.nn.relu(fc1) 

凡is_training是一個布爾值佔位符。請注意,因爲被Batch Normalization paper中所解釋的beta參數所取代,所以不需要偏置添加。

在執行過程中:

# Training phase 
sess.run(loss, feed_dict={x: bx, y: by, is_training: True}) 

# Testing phase 
sess.run(loss, feed_dict={x: bx, y: by, is_training: False}) 
+0

請注意,您可以使用tf.contrib.layers.batch_norm() - 並且它接受的is_training arg可能是一個布爾值佔位符! – ZeDuS

+0

實際上,問題是關於在tensorflow中實現批規範化的另一種方法,所以我提供了我編寫的代碼來實現它,而不使用contrib模塊中的任何函數。 正式的「contrib模塊包含易失性或實驗性代碼」,所以在某些情況下避免使用它會很有用。 –