tf.nn.depthwise_conv2d太慢了。這是正常的嗎？

我試圖出稱爲「Factorized CNN」最近的arXiv工作，tf.nn.depthwise_conv2d太慢了。這是正常的嗎？

主要認爲在空間上分離卷積（深度方向卷積），與信道分段線性投影（1x1conv）一起，可以加快卷積運算。

this is the figure for their conv layer architecture

我發現我可以實現這個架構tf.nn.depthwise_conv2d和1x1的卷積，或tf.nn.separable_conv2d。下面

是我實現：

#conv filter for depthwise convolution 
 
depthwise_filter = tf.get_variable("depth_conv_w", [3,3,64,1], initializer=tf.random_normal_initializer(stddev=np.sqrt(2.0/9/32))) 
 
#conv filter for linear channel projection 
 
pointwise_filter = tf.get_variable("point_conv_w", [1,1,64,64], initializer=tf.random_normal_initializer(stddev=np.sqrt(2.0/1/64))) 
 
conv_b = tf.get_variable("conv_b", [64], initializer=tf.constant_initializer(0)) 
 
#depthwise convolution, with multiplier 1 
 
conv_tensor = tf.nn.relu(tf.nn.depthwise_conv2d(tensor, depthwise_filter, [1,1,1,1], padding='SAME')) 
 
#linear channel projection with 1x1 convolution 
 
conv_tensor = tf.nn.bias_add(tf.nn.conv2d(conv_tensor, pointwise_filter, [1,1,1,1], padding='VALID'), conv_b) 
 
#residual 
 
tensor = tf.add(tensor, conv_tensor)

這應該是更快的約9倍，比原來的3x3x64 - > 64通道卷積。

但是，我無法體驗到任何性能改進。

我必須假設我做錯了，或者張量流的實現出了問題。

由於使用depthwise_conv2d的例子很少，所以我在這裏留下這個問題。

這是慢速正常嗎？或者有什麼錯誤？

來源

2016-09-07 Jong Chan Park

depthwise conv2d的當前實現並未完全利用來自GPU的並行功耗，您需要等待未來更快的實現，例如，在caffe中，此內核存在更快的第三方impl https://github.com/yonghenglh6/DepthwiseConvolution

來源

2018-01-03 14:57:23

tf.nn.depthwise_conv2d太慢了。這是正常的嗎？

回答

相關問題