tensorflow：CUDA_ERROR_OUT_OF_MEMORY總是發生

我打算使用1080 ti（11GB）GPU的tf-seq2seq封裝來訓練seq2seq模型。tensorflow：CUDA_ERROR_OUT_OF_MEMORY總是發生

I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties: 
name: Graphics Device 
major: 6 minor: 1 memoryClockRate (GHz) 1.582 
pciBusID 0000:03:00.0 
Total memory: 10.91GiB 
Free memory: 10.75GiB 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0: Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Graphics Device, pci bus id: 0000:03:00.0) 
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 10.91G (11715084288 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY 
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 12337 get requests, put_count=10124 evicted_count=1000 eviction_rate=0.0987752 and unsatisfied allocation rate=0.268542 
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:259] Raising pool_size_limit_ from 100 to 110 
INFO:tensorflow:Saving checkpoints for 1 into ../model/model.ckpt. 
INFO:tensorflow:step = 1, loss = 5.07399

似乎tensorflow試圖佔據GPU的內存（10.91GiB）的總量，但顯然只有10.75GiB可：我使用不同網絡的大小（甚至nmt_small）總是得到下面的錯誤。

來源

2017-04-18 AmirHJ

除了兼具的關於內存增長提出的建議，你也可以嘗試：

sess_config = tf.ConfigProto() 
sess_config.gpu_options.per_process_gpu_memory_fraction = 0.90 

with tf.Session(config=sess_config) as sess: 
    ...

有了這個，你可以限制分配的GPU內存量在這種情況下，90％的可用GPU內存。也許這足以解決你試圖分配更多內存的問題。如果這還不夠，您將不得不減少批量或網絡的大小。

來源

2017-04-18 11:26:01 ml4294

你應該注意一些小技巧：

1-使用內存增長，從tensorflow文件：「在某些情況下，它是理想的過程中，只分配可用內存的一個子集，或僅成長記憶TensorFlow在會話中提供了兩個Config選項來控制這一點。「

config = tf.ConfigProto() 
config.gpu_options.allow_growth = True 
session = tf.Session(config=config, ...)

2-您是否使用批次進行培訓？或一次填充整個數據？如果是的話，減少您的批量大小

來源

2017-04-18 09:36:53

我正在使用批量訓練。批量大小是32，減少到16是沒有用的。問題是我的GPU根本無法分配10.91GiB。 – AmirHJ

測試這個，我工作'與tf.Session（config = tf.ConfigProto（allow_soft_placement = True，log_device_placement = True））作爲sess：' –

tensorflow：CUDA_ERROR_OUT_OF_MEMORY總是發生

回答

相關問題