使用TF-Slim的全卷積ResNets運行非常緩慢

我將最初在Caffe中實現的像素標記（FCN樣式）的代碼移植到TensorFlow中。我使用Slim實現的ResNet（ResNet-101），跨度爲16px，並使用上卷積層對其進行上採樣，以實現8px的最後跨度。由於輸入圖像的大小是任意的，因此batch_size = 1。問題是培訓真的很慢。它在大約3.5分鐘內處理100張圖像，而我原來的caffe實現在同一硬件（Tesla K40m）上以30秒完成。下面是我的代碼的簡化版本：使用TF-Slim的全卷積ResNets運行非常緩慢

import datetime as dt 

import tensorflow as tf 
import tensorflow.contrib.slim as slim 
from tensorflow.contrib.slim.nets import resnet_v1 

from MyDataset import MyDataset 
from TrainParams import TrainParams 

dataset = MyDataset() 
train_param = TrainParams() 

#tf.device('/gpu:0') 

num_classes = 15 

inputs = tf.placeholder(tf.float32, shape=[1, None, None, 3]) 

with slim.arg_scope(resnet_v1.resnet_arg_scope(False)): 
    mean = tf.constant([123.68, 116.779, 103.939], 
         dtype=tf.float32, shape=[1, 1, 1, 3], name='img_mean') 
    im_centered = inputs - mean 
    net, end_points = resnet_v1.resnet_v1_101(im_centered, 
               global_pool=False, output_stride=16) 

    pred_upconv = slim.conv2d_transpose(net, num_classes, 
             kernel_size = [3, 3], 
             stride = 2, 
             padding='SAME') 

    targets = tf.placeholder(tf.float32, shape=[1, None, None, num_classes]) 

    loss = slim.losses.sigmoid_cross_entropy(pred_upconv, targets) 


log_dir = 'logs/' 

variables_to_restore = slim.get_variables_to_restore(include=["resnet_v1"]) 
restorer = tf.train.Saver(variables_to_restore) 

with tf.Session() as sess: 

    sess.run(tf.initialize_all_variables()) 
    sess.run(tf.initialize_local_variables()) 

    restorer.restore(sess, '/path/to/ResNet-101.ckpt') 

    optimizer = tf.train.GradientDescentOptimizer(learning_rate=.001) 
    train_step = optimizer.minimize(loss) 
    t1 = dt.datetime.now() 
    for it in range(10000): 
     n1=dt.datetime.now() 
     batch = dataset.next_batch() # my function that prepares training batch 
     sess.run(train_step, feed_dict={inputs: batch['inputs'], 
             targets: batch['targets']}) 
     n2=dt.datetime.now() 
     time = (n2-n1).microseconds/(1000) 
     print("iteration ", it, "time", time)

我只是學習的框架，我只放在一起在兩天的這段代碼，讓我明白它可能不是最好的。正如你所看到的，我也嘗試測量數據準備代碼和網絡前後傳輸所花費的實際時間。這個時間實際上要小得多，總結了100次迭代，與實際運行時間相比只有50秒。我懷疑可能會有一些線程/進程同步進行，這不是衡量，但我覺得很奇怪。 top命令顯示了大約10個進程，標題與它可能產生的主進程相同。我也收到如下警告：

I tensorflow/core/common_runtime/gpu/pool_allocator.cc:245] PoolAllocator: After 1692 get requests, put_count=1316 evicted_count=1000 eviction_rate=0.759878 and unsatisfied allocation rate=0.87234 
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:257] Raising pool_size_limit_ from 100 to 110

請問您是否可以指示我如何加快速度？

謝謝。

UPDATE。經過更多的研究後，我發現'餵養'數據與隊列相比可能會比較慢，所以我在一個單獨的線程中重新實現了帶有隊列的代碼：https://gist.github.com/eldar/0ecc058670be340b92e5a1044dc8a089，但運行時間仍然差不多。

UPDATE2。看起來我覺得速度問題是什麼。我訓練完全卷積，我的圖像是任意大小和長寬比。如果我餵養固定大小的虛擬隨機numpy張量，它的工作速度很快。如果生成10個預定義大小的輸入張量，前10次迭代很慢，但隨後會加速。在TensorFlow中看起來像在每次迭代中調整所有張量的大小並不像Caffe那樣高效。我將在項目的GitHub上提交一張票。

來源

2016-09-29 SimpleMan

請記住，這是一個巨大再用模型。 resnet_v1_101中的「101」來自於101層深的事實。 – Julius

不知道你是否期望得到雖然 – Julius

afaik他們使用幾個不同的機器來訓練它 – Julius

這個問題是由於任意大小的輸入圖像造成的。 TensorFlow擁有一種稱爲自動調節的功能，因此在運行時他們會針對每種特定輸入大小分析各種算法，並決定哪種最佳。在我的情況下，每次迭代都會發生這種情況。

溶液設置環境變量TF_CUDNN_USE_AUTOTUNE=0：

export TF_CUDNN_USE_AUTOTUNE=0 
python myscript.py

更多在這個Github上票：https://github.com/tensorflow/tensorflow/issues/5048

來源

2016-10-19 08:44:50 SimpleMan

鏈接問題代碼：https：//gist.github.com/eldar/0ecc058670be340b92e5a1044dc8a089 –

一般來說，TensorFlow resnet的實現不應該比caffe慢（太多）。我只比較了caffe/barrista（https://github.com/classner/barrista/tree/master/examples/residual-nets）和Tensorflow的示例（https://github.com/tensorflow/models/tree/master/resnet）中的實現，並且它們在相同速度下的完整訓練中的差異可以忽略不計。

我確實遇到了Tensorflow實現的問題，它將我帶到了這個頁面。原因是，我構建的github版本並不穩定，並且由於中間開發代碼非常慢。 A git pull並重新編譯解決了這個問題。

但是，如果您正在爲自己重新實現，請注意如何觸發BatchNorm更新操作。在張量流例子中，這在resnet_model.py，l中完成。 172.它們被直接添加到run操作的「提取」中，因此並行和儘快執行。

來源

2016-10-16 20:53:15 Chris

感謝您的回覆！我還使用了夜間版本，因爲對ResNets的支持不在穩定版本中。你使用了哪個版本？另外，您是否使用了自己的數據集，在這種情況下，您是如何加載數據的？我懷疑我的數據加載代碼可能不是最佳的。 – SimpleMan

所以我更新到0.11.0rc0版本，我可以看到沒有其他10個python進程同時運行，這是一個好兆頭，但它仍然一樣慢。 – SimpleMan

使用TF-Slim的全卷積ResNets運行非常緩慢

回答

相關問題