遠投: 的Ubuntu 16.04的Nvidia 1070 8Gig在船上?該機擁有64千兆的RAM和數據集爲1萬條記錄和當前的CUDA,CDNN庫,TensorFlow 1.0的Python 3.6TensorFlow Nvidia 1070 GPU內存分配錯誤如何排除故障?
不知道如何解決?
我一直在努力得到一些車型了TensorFlow並已運行到這一現象多次:我不知道以外的任何其他TensorFlow使用GPU內存?
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties: name: GeForce GTX 1070 major: 6 minor: 1 memoryClockRate (GHz) 1.645 pciBusID 0000:01:00.0 Total memory: 7.92GiB Free memory: 7.56GiB I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0 I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0: Y I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1070, pci bus id: 0000:01:00.0) E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 7.92G (8499298304 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 1.50G (1614867456 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 1.50G (1614867456 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY E tensorflow/stream_executor/cuda/cu
我得到這個下面這表明某種內存分配是怎麼回事?但仍然失敗。
`I tensorflow/core/common_runtime/bfc_allocator.cc:696] 5 Chunks of size 899200000 totalling 4.19GiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 1649756928 totalling 1.54GiB
I tensorflow/core/common_runtime/bfc_allocator.cc:700] Sum Total of in-use chunks: 6.40GiB
I tensorflow/core/common_runtime/bfc_allocator.cc:702] Stats:
Limit: 8499298304
InUse: 6875780608
MaxInUse: 6878976000
NumAllocs: 338
MaxAllocSize: 1649756928
W tensorflow/core/common_runtime/bfc_allocator.cc:274] ******************************************************************************************xxxxxxxxxx
W tensorflow/core/common_runtime/bfc_allocator.cc:275] Ran out of memory trying to allocate 6.10MiB. See logs for memory state.
W tensorflow/core/framework/op_kernel.cc:993] Internal: Dst tensor is not initialized.
[[Node: linear/linear/marital_status/marital_status_weights/embedding_lookup_sparse/strided_slice/_1055 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/gpu:0", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=1, tensor_name="edge_1643_linear/linear/marital_status/marital_status_weights/embedding_lookup_sparse/strided_slice", tensor_type=DT_INT64, _device="/job:localhost/replica:0/task:0/gpu:0"]()]]
` 更新:我從減少數以百萬計的記錄計數至40,000有一個基本模型運行至結束。我仍然收到一條錯誤消息,但不是連續的。我在模型輸出中獲得了一堆文本,提示重構模型,我懷疑數據結構是問題的一個重要部分。仍然可以使用一些更好的提示如何調試的全過程..下面是剩下的控制檯輸出
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:910] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties:
name: GeForce GTX 1070
major: 6 minor: 1 memoryClockRate (GHz) 1.645
pciBusID 0000:01:00.0
Total memory: 7.92GiB
Free memory: 7.52GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0: Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1070, pci bus id: 0000:01:00.0)
E tensorflow/stream_executor/cuda/cuda_driver.cc:1002] failed to allocate 7.92G (8499298304 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
[I 09:13:09.297 NotebookApp] Saving file at /Documents/InfluenceH/Working_copies/Cond_fcast_wkg/TensorFlow+DNNLinearCombinedClassifier+for+Influence.ipynb
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1070, pci bus id: 0000:01:00.0)
這個未回答的問題很相似:http://stackoverflow.com/questions/42495930/tensorflow-oom-on-gpu?rq=1 – dartdog