2017-02-25 43 views
0

我試圖在使用多個cpu線程讀取數據時提取當前epoch編號。然而,在審判代碼期間,我觀察到一個沒有任何意義的輸出。考慮下面的代碼:使用input_producer/limit_epochs/epochs跨多個線程訪問epoch值:0本地變量

with tf.Session() as sess: 
     train_filename_queue = tf.train.string_input_producer(trainimgs, num_epochs=4, shuffle=True) 
     value = train_filename_queue.dequeue() 
     init_op = tf.group(tf.global_variables_initializer(), tf.local_variables_initializer()) 
     sess.run(init_op) 
     coord = tf.train.Coordinator() 
     tf.train.start_queue_runners(coord=coord) 
     collections = [v.name for v in tf.get_collection(tf.GraphKeys.LOCAL_VARIABLES,\ 
                 scope='input_producer/limit_epochs/epochs:0')] 
     print(collections) 

     threads = [threading.Thread(target=work, args=(coord, value, sess, collections)) for i in \ 
        range(20)] 
     for t in threads: 
      t.start() 
     coord.join(threads) 
     coord.request_stop() 

work函數定義如下:

def work(coord, val, sess, collections): 
    counter = 0 
    while not coord.should_stop(): 
     try: 
      epoch = sess.run(collections[0]) 
      filename = sess.run(val).decode(encoding='UTF-8') 
      print(filename + ' ' + str(epoch)) 
     except tf.errors.OutOfRangeError: 
      coord.request_stop() 
    return None 

我得到的輸出是下面的:

I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties: 
name: GeForce GTX TITAN X 
major: 5 minor: 2 memoryClockRate (GHz) 1.076 
pciBusID 0000:84:00.0 
Total memory: 11.92GiB 
Free memory: 11.80GiB 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0: Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX TITAN X, pci bus id: 0000:84:00.0) 
I tensorflow/compiler/xla/service/platform_util.cc:58] platform CUDA present with 1 visible devices 
I tensorflow/compiler/xla/service/platform_util.cc:58] platform Host present with 20 visible devices 
I tensorflow/compiler/xla/service/service.cc:180] XLA service executing computations on platform Host. Devices: 
I tensorflow/compiler/xla/service/service.cc:187] StreamExecutor device (0): <undefined>, <undefined> 
I tensorflow/compiler/xla/service/platform_util.cc:58] platform CUDA present with 1 visible devices 
I tensorflow/compiler/xla/service/platform_util.cc:58] platform Host present with 20 visible devices 
I tensorflow/compiler/xla/service/service.cc:180] XLA service executing computations on platform CUDA. Devices: 
I tensorflow/compiler/xla/service/service.cc:187] StreamExecutor device (0): GeForce GTX TITAN X, Compute Capability 5.2 
['input_producer/limit_epochs/epochs:0'] 
/local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_4760.JPEG 0 2 
/local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_703.JPEG 0 4 
/local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_11768.JPEG 0 4 
/local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_3271.JPEG 0 4 
/local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_1015.JPEG 0 4 
/local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_730.JPEG 0 4 
/local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_1945.JPEG 0 4 
/local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_3149.JPEG 0 4 
/local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_4209.JPEG 0 4 
/local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_40.JPEG 0 4 
/local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_11768.JPEG 0 4 
/local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_4760.JPEG 0 4 
/local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_703.JPEG 0 4 
/local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_4209.JPEG 0 4 
/local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_40.JPEG 0 4 
/local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_730.JPEG 0 4 
/local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_3271.JPEG 0 4 
/local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_1015.JPEG 0 4 
/local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_3149.JPEG 0 4 
/local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_1945.JPEG 0 4 
/local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_40.JPEG 0 4 
/local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_4209.JPEG 0 4 
/local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_730.JPEG 0 4 
/local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_1945.JPEG 0 4 
/local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_4760.JPEG 0 4 
/local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_3271.JPEG 0 4 
/local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_703.JPEG 0 4 
/local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_1015.JPEG 0 4 
/local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_11768.JPEG 0 4 
/local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_3149.JPEG 0 4 
/local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_4209.JPEG 0 4 
/local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_11768.JPEG 0 4 
/local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_4760.JPEG 0 4 
/local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_730.JPEG 0 4 
/local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_703.JPEG 0 4 
/local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_3149.JPEG 0 4 
/local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_3271.JPEG 0 4 
/local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_1945.JPEG 0 4 
/local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_1015.JPEG 0 4 
/local/ujjwal/ILSVRC2015/Data/CLS-LOC/train/n01768244/n01768244_40.JPEG 0 4 

每行中的最後一個數字對應於局部變量的值爲input_producer/limit_epochs/epochs:0'

對於第一次試驗,我只在隊列中保留了10張圖像,這意味着我應該總共獲得40行輸出。

  • 然而,我應該得到的1,2,3和4中的每一行的最後一個字符數目相等的,因爲每個文件名應該在每個4個曆元被提取。

爲什麼我在所有行中獲得相同的數字4?

更多信息

  • 我嘗試使用範圍(1)(用於單個線程),並且仍然是相同的觀察。
  • 不要打擾數字'0'。它只是相應文件的標籤。我以這種方式保存了圖像文件名稱。

回答

1

我做了很多的實驗,最後得出以下結論:

我曾經認爲 -

tf.train.string_input_producer()排入隊列劃時代的明智。
這意味着,第一個完整時期入隊(如果容量小於文件名的數量,則在多個
階段),然後
進一步的時期被列入隊列。

事實並非如此。

當執行tf.start_queue_runners(),所有的歷元是 入隊到一起(在多個階段中,如果容量小於號碼的文件名的 )。 tf.train.string_input_producer使用局部變量epochs:0來維護正在入隊的時期。一旦epochs:0達到num_epochs,它將保持不變,無論有多少線程從隊列中出列,它都不會更改。

當你捕捉epochs:0它給你的計數器epochs的瞬時值的值,它會告訴你,當時該數據集的時代正在排隊。它沒有告訴你,你出隊的數據集是哪一個時代。

因此,從epochs:0local_variable獲取當前時期的值是一個壞主意。