2017-04-23 57 views
4

我正在將圖像讀入我的TF網絡,但我還需要關聯的標籤以及它們。TF slice_input_producer不使張量保持同步

所以我試圖按照this answer,但輸出的標籤實際上並不匹配我在每批中獲得的圖像。

我的圖像名稱格式爲dir/3.jpg,所以我只是從圖像文件名中提取標籤。

truth_filenames_np = ... 
truth_filenames_tf = tf.convert_to_tensor(truth_filenames_np) 

# get the labels 
labels = [f.rsplit("/", 1)[1] for f in truth_filenames_np] 

labels_tf = tf.convert_to_tensor(labels) 

# *** This line should make sure both input tensors are synced (from my limited understanding) 
# My list is also already shuffled, so I set shuffle=False 
truth_image_name, truth_label = tf.train.slice_input_producer([truth_filenames_tf, labels_tf], shuffle=False) 


truth_image_value = tf.read_file(truth_image_name) 
truth_image = tf.image.decode_jpeg(truth_image_value) 
truth_image.set_shape([IMAGE_DIM, IMAGE_DIM, 3]) 
truth_image = tf.cast(truth_image, tf.float32) 
truth_image = truth_image/255.0 

# Another key step, where I batch them together 
truth_images_batch, truth_label_batch = tf.train.batch([truth_image, truth_label], batch_size=mb_size) 


with tf.Session() as sess: 
    sess.run(tf.global_variables_initializer()) 

    coord = tf.train.Coordinator() 
    threads = tf.train.start_queue_runners(coord=coord) 

    for i in range(epochs): 
     print "Epoch ", i 
     X_truth_batch = truth_images_batch.eval() 
     X_label_batch = truth_label_batch.eval() 

     # Here I display all the images in this batch, and then I check which file numbers they actually are. 
     # BUT, the images that are displayed don't correspond with what is printed by X_label_batch! 
     print X_label_batch 
     plot_batch(X_truth_batch) 



    coord.request_stop() 
    coord.join(threads) 

我做錯了什麼,或者slice_input_producer沒有真正確保它的輸入張量是同步的嗎?

旁白:

我也注意到,當我得到tf.train.batch批次,該批次中的元素,我把它原來的列表是彼此相鄰,但批訂單ISN」 t按原始順序排列。例如:如果我的數據是[「dir/1.jpg」,「dir/2.jpg」,「dir/3.jpg」,「dir/4.jpg」,「dir/5.jpg」,dir/6.jpg「],然後我可以批量處理(batch_size = 2)[」dir/3.jpg「,」dir/4.jpg「],然後批處理[」dir/1.jpg「,」dir/2.jpg「],然後是最後一個 因此,這使得很難甚至只是使用FIFO隊列作爲標籤,因爲訂單將不符合批次訂單的要求。

+0

能否請您編輯代碼,以再現該問題的最低限度的入隊嘗試?如在,刪除所有圖像處理,看看圖像/標籤是否洗牌 - 因爲它是我們不能運行這個代碼,除非我們有文件 –

回答

1

這是一個完整的可運行示例,重現問題:

import tensorflow as tf 

truth_filenames_np = ['dir/%d.jpg' % j for j in range(66)] 
truth_filenames_tf = tf.convert_to_tensor(truth_filenames_np) 
# get the labels 
labels = [f.rsplit("/", 1)[1] for f in truth_filenames_np] 
labels_tf = tf.convert_to_tensor(labels) 

# My list is also already shuffled, so I set shuffle=False 
truth_image_name, truth_label = tf.train.slice_input_producer(
    [truth_filenames_tf, labels_tf], shuffle=False) 

# # Another key step, where I batch them together 
# truth_images_batch, truth_label_batch = tf.train.batch(
#  [truth_image_name, truth_label], batch_size=11) 

epochs = 7 

with tf.Session() as sess: 
    sess.run(tf.global_variables_initializer()) 
    coord = tf.train.Coordinator() 
    threads = tf.train.start_queue_runners(coord=coord) 
    for i in range(epochs): 
     print("Epoch ", i) 
     X_truth_batch = truth_image_name.eval() 
     X_label_batch = truth_label.eval() 
     # Here I display all the images in this batch, and then I check 
     # which file numbers they actually are. 
     # BUT, the images that are displayed don't correspond with what is 
     # printed by X_label_batch! 
     print(X_truth_batch) 
     print(X_label_batch) 
    coord.request_stop() 
    coord.join(threads) 

什麼這是打印:

Epoch 0 
b'dir/0.jpg' 
b'1.jpg' 
Epoch 1 
b'dir/2.jpg' 
b'3.jpg' 
Epoch 2 
b'dir/4.jpg' 
b'5.jpg' 
Epoch 3 
b'dir/6.jpg' 
b'7.jpg' 
Epoch 4 
b'dir/8.jpg' 
b'9.jpg' 
Epoch 5 
b'dir/10.jpg' 
b'11.jpg' 
Epoch 6 
b'dir/12.jpg' 
b'13.jpg' 

所以基本上每個eval call都會再次運行操作!添加配料不作出一個區別 - 只是打印批次(第11名,隨後在接下來的11個標籤等)

解決方法我看到的是:

for i in range(epochs): 
    print("Epoch ", i) 
    pair = tf.convert_to_tensor([truth_image_name, truth_label]).eval() 
    print(pair[0]) 
    print(pair[1]) 

其打印正確:

Epoch 0 
b'dir/0.jpg' 
b'0.jpg' 
Epoch 1 
b'dir/1.jpg' 
b'1.jpg' 
# ... 

但是對違反最小驚喜的原則沒有做任何事情。

編輯:尚未這樣做的另一種方式:

import tensorflow as tf 

truth_filenames_np = ['dir/%d.jpg' % j for j in range(66)] 
truth_filenames_tf = tf.convert_to_tensor(truth_filenames_np) 
labels = [f.rsplit("/", 1)[1] for f in truth_filenames_np] 
labels_tf = tf.convert_to_tensor(labels) 
truth_image_name, truth_label = tf.train.slice_input_producer(
    [truth_filenames_tf, labels_tf], shuffle=False) 
epochs = 7 
with tf.Session() as sess: 
    sess.run(tf.global_variables_initializer()) 
    tf.train.start_queue_runners(sess=sess) 
    for i in range(epochs): 
     print("Epoch ", i) 
     X_truth_batch, X_label_batch = sess.run(
      [truth_image_name, truth_label]) 
     print(X_truth_batch) 
     print(X_label_batch) 

這是爲tf.convert_to_tensor和合作一個更好的方法只接受同類型的張量/形狀等

注意,我刪除了協調員爲簡單起見,但這會導致警告:

W c:\ tf_jenkins \ home \ workspace \ release-win \ device \ cpu \ os \ windows \ tensorfl流\核心\仁\ queue_base。抄送:294] _0_input_producer/input_producer/fraction_of_32_full/fraction_of_32_full:跳繩取消了與隊列中沒有關閉

this