1

嘗試運行我的3D卷積神經網絡時,出現以下錯誤。可能是什麼原因 ?使用cnn的ResourceExhaustedError

ResourceExhaustedError(參見上述用於回溯):Variable_10 /亞當/分配= 將[T = DT_FLOAT,_class = [「LOC:分配 張量與形狀[54080,1024] [[節點時OOM @ Variable_10 「],use_locking =真, validate_shape =真, _device =」/作業:本地主機/複製:0 /任務:0/GPU:0" ](Variable_10 /亞當,zeros_4)]]

這是我使用的代碼:

import tensorflow as tf 
import numpy as np 

IMG_SIZE_PX = 50 
SLICE_COUNT = 20 

n_classes = 2 
batch_size = 10 

x = tf.placeholder('float') 
y = tf.placeholder('float') 

keep_rate = 0.8 
def conv3d(x, W): 
    return tf.nn.conv3d(x, W, strides=[1,1,1,1,1], padding='SAME') 

def maxpool3d(x): 
    return tf.nn.max_pool3d(x, ksize=[1,2,2,2,1], strides=[1,2,2,2,1], padding='SAME') 

def convolutional_neural_network(x): 

    weights = {'W_conv1':tf.Variable(tf.random_normal([3,3,3,1,32])), 

       'W_conv2':tf.Variable(tf.random_normal([3,3,3,32,64])), 

       'W_fc':tf.Variable(tf.random_normal([54080,1024])), 
       'out':tf.Variable(tf.random_normal([1024, n_classes]))} 

    biases = {'b_conv1':tf.Variable(tf.random_normal([32])), 
       'b_conv2':tf.Variable(tf.random_normal([64])), 
       'b_fc':tf.Variable(tf.random_normal([1024])), 
       'out':tf.Variable(tf.random_normal([n_classes]))} 


    x = tf.reshape(x, shape=[-1, IMG_SIZE_PX, IMG_SIZE_PX, SLICE_COUNT, 1]) 

    conv1 = tf.nn.relu(conv3d(x, weights['W_conv1']) + biases['b_conv1']) 
    conv1 = maxpool3d(conv1) 


    conv2 = tf.nn.relu(conv3d(conv1, weights['W_conv2']) + biases['b_conv2']) 
    conv2 = maxpool3d(conv2) 

    fc = tf.reshape(conv2,[-1, 54080]) 
    fc = tf.nn.relu(tf.matmul(fc, weights['W_fc'])+biases['b_fc']) 
    fc = tf.nn.dropout(fc, keep_rate) 

    output = tf.matmul(fc, weights['out'])+biases['out'] 

    return output 

much_data = np.load('muchdata-50-50-20.npy') 
# If you are working with the basic sample data, use maybe 2 instead of 100 here... you don't have enough data to really do this 
train_data = much_data[:-100] 
validation_data = much_data[-100:] 


def train_neural_network(x): 
    prediction = convolutional_neural_network(x) 
    cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=prediction,labels=y)) 
    optimizer = tf.train.AdamOptimizer(learning_rate=1e-3).minimize(cost) 

    hm_epochs = 10 
    with tf.Session() as sess: 
     sess.run(tf.global_variables_initializer()) 

     successful_runs = 0 
     total_runs = 0 

     for epoch in range(hm_epochs): 
      epoch_loss = 0 
      for data in train_data: 
       total_runs += 1 
       try: 
        X = data[0] 
        Y = data[1] 
        _, c = sess.run([optimizer, cost], feed_dict={x: X, y: Y}) 
        epoch_loss += c 
        successful_runs += 1 
       except Exception as e: 
        # I am passing for the sake of notebook space, but we are getting 1 shaping issue from one 
        # input tensor. Not sure why, will have to look into it. Guessing it's 
        # one of the depths that doesn't come to 20. 
        pass 
        #print(str(e)) 

      print('Epoch', epoch+1, 'completed out of',hm_epochs,'loss:',epoch_loss) 

      correct = tf.equal(tf.argmax(prediction, 1), tf.argmax(y, 1)) 
      accuracy = tf.reduce_mean(tf.cast(correct, 'float')) 

      print('Accuracy:',accuracy.eval({x:[i[0] for i in validation_data], y:[i[1] for i in validation_data]})) 

     print('Done. Finishing accuracy:') 
     print('Accuracy:',accuracy.eval({x:[i[0] for i in validation_data], y:[i[1] for i in validation_data]})) 

     print('fitment percent:',successful_runs/total_runs) 

train_neural_network(x) 

我正在運行這個使用tensorflow-gpu版本。我正在使用GTX970M並安裝了CUDA並正確導入了cudnn文件。當運行最後一個命令時,我得到以下錯誤。請幫助!

+0

你的GTX970M有多少內存? – Jason

回答

0

由於某些原因,您的內存不足。 這可能是因爲你有一些使用GPU的應用程序(例如另一個張量流會話仍處於活動狀態)。請檢查情況是否如此。 (你可以使用nvidia-smi來監控)。

如果不是這樣,主要是因爲模型的大小和GPU內存的大小。你可以做的是嘗試在CPU模式下啓動它,用tf.Variables列出所有的變量,做它代表多少內存的數學,看看它是否適合你的GPU。

直到你做完這個,我沒有更多的建議可以提供。

+0

是的,問題是前一個會話的tensorflow仍在使用我的GPU,即使進程已經停止。我不得不關閉cmd並再次嘗試使用它開始使用GPU。 –