Tensorflow：鋼釘變量CPU在Multigpu培訓工作不

我使用tensorflow訓練我的第一個多GPU模式。正如本教程所述，變量被固定在CPU上，並且使用name_scope在每個GPU上運行。Tensorflow：鋼釘變量CPU在Multigpu培訓工作不

正如我運行一個小的測試並記錄設備放置，我可以看到的OPS被放置到與TOWER_1/TOWER_0前綴各自GPU但變量沒有被放置在所述CPU上。

我缺少的東西還是我理解設備放置日誌不正確。

附加的測試代碼和這裏是device placement log

由於

TEST CODE

with tf.device('cpu:0'): 
    imgPath=tf.placeholder(tf.string) 
    imageString=tf.read_file(imgPath) 
    imageJpeg=tf.image.decode_jpeg(imageString, channels=3) 
    inputImage=tf.image.resize_images(imageJpeg, [299,299]) 
    inputs = tf.expand_dims(inputImage, 0) 
    for i in range(2): 
     with tf.device('/gpu:%d' % i): 
      with tf.name_scope('%s_%d' % ('TOWER', i)) as scope: 
       with slim.arg_scope([tf.contrib.framework.python.ops.variables.variable], device='/cpu:0'): 
        with slim.arg_scope(inception_v3.inception_v3_arg_scope()): 
         logits,endpoints = inception_v3.inception_v3(inputs, num_classes=1001, is_training=False) 
       tf.get_variable_scope().reuse_variables() 

with tf.Session(config=tf.ConfigProto(allow_soft_placement=True,log_device_placement=True)) as sess: 
    tf.initialize_all_variables().run() 
exit(0)

EDIT 基本上「線與slim.arg_scope（[tf.contrib .framework.python.ops.variables.variable]，device ='/ cpu：0'）：'應該強制cpu上的所有變量，但它們是創建的d在「GPU：0

來源

2016-11-12 Ashish Kumar

好了，直到變量'expand_dims'放在'cpu'，因爲你有要求'用tf.device（'cpu：0'）：'。所有連接到'inception'模型的變量都放在'gpu'中。 – sygi

感謝，我明白了，什麼是 '與slim.arg_scope（[tf.contrib.framework.python.ops.variables.variable]，設備='/ CPU：0 '）的作用' 雖然 –

是不是'allow_soft_placement'干擾？如果將它設置爲「False」，它應該將它放在你告訴它（或失敗）的地方。根據該[inception_train]（https://github.com/tensorflow/models/blob/master/inception/inception/inception_train.py）例如 – drpng

嘗試用：

with slim.arg_scope([slim.model_variable, slim.variable], device='/cpu:0'):

這是摘自： model_deploy

來源

2016-11-27 23:16:41

Tensorflow：鋼釘變量CPU在Multigpu培訓工作不

回答

相關問題