TFlearn/Tensorflow方式比預期的要慢

我正在研究有關數值優化的工程問題。通常我會使用像Kriging這樣的元模型方法 - 但現在我想嘗試新的流行元素。這裏的數據顯示了一個具有7個幾何參數的機器零件，因此我得到一個功率值。後來，當數據合適時，我想查看是否可以使用網絡上運行的簡單粒子羣算法對部件進行優化。TFlearn/Tensorflow方式比預期的要慢

我寫了一個自己的簡單的ANN，其中有幾個隱藏的層塊，小規模的網絡（例如7,30,30,1）的性能是每秒數百個時代。較大的網絡（例如7,999,999,1）經歷了明顯的放緩。現在我想利用我的GPU（Quadro M4000）的強大功能。這裏是我的代碼，修改MNIST例子，我在網上找到了這裏https://github.com/tflearn/tflearn/blob/master/examples/images/dnn.py

# -*- coding: utf-8 -*- 

from __future__ import division, print_function, absolute_import 
import tflearn 
import tensorflow as tf 
import numpy as np 

with tf.device('/gpu:0'): 
    # Data loading and preprocessing. 7 geometric parameters with one solution (last row) each 
    Train =np.array([[0.848,0.994,0.475,0.120,0.141,0.670,0.156,-0.29750], 
      [0.761,0.221,0.414,0.825,0.429,0.174,0.804,12.28900], 
      [0.915,0.891,0.177,0.936,0.503,0.820,0.028,4.46918], 
      [0.730,0.203,0.660,0.065,0.124,0.326,0.183,10.51110], 
      [0.071,0.247,0.512,0.568,0.395,0.149,0.881,12.31750], 
      [0.650,0.836,0.165,0.712,0.334,0.764,0.134,10.23280], 
      [0.144,0.261,0.605,0.208,0.557,0.640,0.281,12.26680], 
      [0.584,0.301,0.959,0.460,0.987,0.300,0.107,9.39676], 
      [0.384,0.555,0.276,0.755,0.718,0.200,0.249,12.49240], 
      [0.278,0.080,0.376,0.857,0.462,0.900,0.559,10.70610], 
      [0.740,0.514,0.086,0.968,0.443,0.923,0.641,9.75352], 
      [0.483,0.103,0.945,0.547,0.822,0.784,0.614,10.08490], 
      [0.871,0.732,0.117,0.881,0.883,0.594,0.759,10.22090], 
      [0.334,0.332,0.547,0.392,0.150,0.075,0.726,12.01180], 
      [0.558,0.464,0.083,0.486,0.651,0.975,0.065,8.93398], 
      [0.154,0.775,0.781,0.047,0.944,0.689,0.676,10.59200], 
      [0.616,0.145,0.195,0.337,0.269,0.266,0.746,11.55840], 
      [0.260,0.296,0.922,0.005,0.028,0.465,0.379,1.95064], 
      [0.005,0.757,0.554,0.731,0.906,0.163,0.581,12.48950], 
      [0.542,0.876,0.025,0.138,0.250,0.747,0.783,8.96552], 
      [0.353,0.363,0.844,0.175,0.803,0.287,0.905,10.94560], 
      [0.785,0.701,0.491,0.067,0.076,0.656,0.079,11.47360], 
      [0.826,0.181,0.530,0.789,0.305,0.395,0.843,12.08160], 
      [0.450,0.153,0.790,0.609,0.006,0.768,0.626,9.92177], 
      [0.212,0.192,0.595,0.245,0.776,0.866,0.093,11.61500], 
      [0.066,0.279,0.866,0.219,0.789,0.574,0.899,11.29240], 
      [0.037,0.428,0.361,0.564,0.378,0.946,0.967,11.27440], 
      [0.690,0.903,0.285,1.000,0.058,0.233,0.476,11.69370], 
      [0.027,0.450,0.228,0.899,0.955,0.449,0.346,11.38700], 
      [0.240,0.032,0.687,0.372,0.329,0.844,0.309,11.53640], 
      [0.187,0.665,0.301,0.423,0.114,0.868,0.216,11.86770], 
      [0.505,0.499,0.201,0.737,0.179,0.120,0.484,12.45780], 
      [0.578,0.397,0.821,0.661,0.212,0.337,0.036,4.31481], 
      [0.949,0.646,0.892,0.586,0.702,0.536,0.416,1.93674], 
      [0.885,0.789,0.994,0.312,0.227,0.037,0.350,0.05622], 
      [0.957,0.123,0.317,0.848,0.549,0.510,0.705,11.63030], 
      [0.110,0.692,0.384,0.269,0.933,0.113,0.423,12.74620], 
      [0.127,0.678,0.008,0.351,0.649,0.620,0.860,10.38200], 
      [0.228,0.940,0.247,0.099,0.689,0.093,0.956,11.63780], 
      [0.403,0.063,0.464,0.904,0.091,0.401,0.288,11.86670], 
      [0.976,0.088,0.718,0.966,0.261,0.816,0.992,10.76000], 
      [0.801,0.475,0.871,0.510,0.613,0.885,0.261,8.64352], 
      [0.710,0.413,0.262,0.299,0.864,0.426,0.517,12.07250], 
      [0.421,0.919,0.764,0.627,0.620,0.729,0.539,1.79113], 
      [0.446,0.971,0.338,0.692,0.187,0.061,0.121,12.02700], 
      [0.861,0.539,0.901,0.319,0.481,0.369,0.231,1.28605], 
      [0.367,0.809,0.675,0.112,0.574,0.701,0.200,10.91850], 
      [0.288,0.042,0.618,0.264,0.848,0.355,0.513,12.04380]]) 

    Y = Train[:, 7].reshape([48,1]) 
    X = np.delete(Train, (7), axis=1) 

    Test =np.array([[0.088,0.857,0.448,0.156,0.364,0.487,0.007,12.21500], 
      [0.653,0.371,0.645,0.767,0.034,0.526,0.386,11.5270], 
      [0.618,0.008,0.573,0.929,0.284,0.556,0.588,11.55110], 
      [0.531,0.348,0.109,0.190,0.674,0.985,0.785,9.66132], 
      [0.768,0.609,0.744,0.024,0.969,0.009,0.331,9.49345], 
      [0.177,0.523,0.974,0.814,0.736,0.611,0.451,9.61419], 
      [0.919,0.817,0.057,0.530,0.415,0.467,0.693,10.44850], 
      [0.310,0.628,0.804,0.672,0.598,0.244,0.652,10.37430], 
      [0.430,0.900,0.480,0.700,0.620,0.300,0.600,11.77790], 
      [0.318,0.741,0.142,0.405,0.532,0.189,0.936,11.86930]]) 

    testY = Test[:, 7].reshape([10,1]) 
    testX = np.delete(Test, (7), axis=1)  


    # Building deep neural network 

    input_layer = tflearn.input_data(shape=[None, 7]) 
    dense1 = tflearn.fully_connected(input_layer, 70, activation='relu', 
            regularizer='L2', weight_decay=0.001) 
    dropout1 = tflearn.dropout(dense1, 0.8) 
    dense2 = tflearn.fully_connected(dropout1, 70, activation='tanh', 
            regularizer='L2', weight_decay=0.001) 
    dropout2 = tflearn.dropout(dense2, 0.8) 
    linear = tflearn.fully_connected(dropout2, 1, activation='linear') 

    #Regression using SGD 
    sgd = tflearn.optimizers.SGD(learning_rate=0.01, lr_decay=0.96, decay_step=100) 
    net = tflearn.regression(linear, optimizer=sgd, loss='mean_square', 
            metric='R2', learning_rate=0.01) 
with tf.device('/cpu:0'):  
    # Training 
    model = tflearn.DNN(net) 
    model.fit(X, Y, n_epoch=200, validation_set=(testX, testY),batch_size = None, 
      show_metric=True, run_id="dense_model") 

    print("\nTest prediction") 
    print(testY) 
    print(model.predict(testX))

現在我有2個問題：

基本上不管我選擇什麼樣的隱藏層的大小，我得到這樣的表現。每個時代超過一秒！即使它僅在CPU上運行，它應該會更快，不是嗎？

Training Step: 19 | total loss: 87.36314 | time: 1.019s

Training Step: 20 | total loss: 86.11715 | time: 1.013s

不知爲什麼使用「與tf.device（ '/ GPU：0'）：」完整的代碼崩潰與錯誤消息：

tensorflow.python.framework.errors_impl.InvalidArgumentError: Node 'init_3/NoOp': Unknown input node '^is_training/Assign'

編輯：我定數2添加這些行：

tflearn.init_graph(num_cores=6,gpu_memory_fraction=0.2) 
config = tf.ConfigProto() 
config.gpu_options.allow_growth = True

10個與10000點之間的差異是不容忽視的小〜1.1秒即可與1.7是那麼我必須假設有一個最大的開銷別的地方怎麼回事？

來源

2017-09-02 JP K.

好的，答案是驗證每個epoche花了這麼多時間。老實說，我並沒有期待這一點，因爲據我所知只有一次計算，不會花費太多時間。只驗證每個n次的步驟變化與以下很多：

model.fit(X, Y, n_epoch=2000, validation_set=(testX, testY),batch_size = None,snapshot_step=100, snapshot_epoch=False, 
     show_metric=True, run_id="dense_model")

來源

2017-09-04 07:27:02

TFlearn/Tensorflow方式比預期的要慢

回答

相關問題