儘管指定`device_count = {'CPU'：1，'GPU'：0}`，但由於GPU上的內存不足錯誤，爲什麼TensorFlow會話無法啓動？

我試圖使用它的GPU VRAM> 96％的服務器上運行：儘管指定`device_count = {'CPU'：1，'GPU'：0}`，但由於GPU上的內存不足錯誤，爲什麼TensorFlow會話無法啓動？

import tensorflow as tf 

a = tf.constant(1, name = 'a') 
b = tf.constant(3, name = 'b') 
c = tf.constant(9, name = 'c') 
d = tf.add(a, b, name='d') 
e = tf.add(d, c, name='e') 

session_conf = tf.ConfigProto(
      device_count={'CPU': 1, 'GPU': 0}, 
      allow_soft_placement=True 
     ) 
sess = tf.Session(config=session_conf) 
print(sess.run([d, e]))

它給了我CUDA_ERROR_OUT_OF_MEMORY錯誤停止程序的執行：

[email protected]:/scratch/test$ python3.5 shape.py 
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcublas.so.8.0 locally 
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcudnn.so.5 locally 
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcufft.so.8.0 locally 
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcuda.so.1 locally 
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcurand.so.8.0 locally 
E tensorflow/core/common_runtime/direct_session.cc:137] Internal: failed initializing StreamExecutor for CUDA device ordinal 0: Internal: failed call to cuDevicePrimaryCtxRetain: CUDA_ERROR_OUT_OF_MEMORY; total memory reported: 18446744073709551615 
Traceback (most recent call last): 
    File "shape.py", line 20, in <module> 
    sess = tf.Session(config=session_conf) 
    File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1187, in __init__ 
    super(Session, self).__init__(target, graph, config=config) 
    File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 552, in __init__ 
    self._session = tf_session.TF_NewDeprecatedSession(opts, status) 
    File "/usr/lib/python3.5/contextlib.py", line 66, in __exit__ 
    next(self.gen) 
    File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/errors_impl.py", line 469, in raise_exception_on_not_ok_status 
    pywrap_tensorflow.TF_GetCode(status)) 
tensorflow.python.framework.errors_impl.InternalError: Failed to create session.

爲什麼能水平假設我在創建TensorFlow會話時指定了device_count={'CPU': 1, 'GPU': 0}, allow_soft_placement=True，vRAM使用會干擾我的程序？

來源

2017-02-23 Franck Dernoncourt

我不確定device_count={'GPU': 0}是否可以防止GPU內存分配，我以前沒有看到過。有可能它不起作用，因爲GPU分配器是一個過程級概念，因爲它在會話之間共享。所以你試圖通過會話級配置來配置進程級別的設置。最確定的方法是通過設置env var-export CUDA_VISIBLE_DEVICES=

來源

2017-02-23 21:18:37

使進程級別的GPU不可見請注意，在windows'export CUDA_VISIBLE_DEVICES ='不會工作（因爲我發現了困難的方法[在這裏]（https： //stackoverflow.com/questions/44500733/tensorflow-allocating-gpu-memory-when-using-tf-device-cpu0/44513295?noredirect=1#comment76027592_44513295））。爲了有效地屏蔽所有的GPU，你必須設置'CUDA_VISIBLE_DEVICES = -1'（或者任何其他無效的設備編號）。 – GPhilo

它適用於我，而且它的使用非常廣泛，在你的情況下，你的CUDA驅動程序必須是特殊的。 –

您正在使用哪種CUDA SDK？我使用的是版本8，在他們的文檔中他們沒有指定空字符串的行爲 – GPhilo

儘管指定`device_count = {'CPU'：1，'GPU'：0}`，但由於GPU上的內存不足錯誤，爲什麼TensorFlow會話無法啓動？

回答

相關問題