2017-02-23 539 views
0

我試圖使用它的GPU VRAM> 96%的服務器上運行:儘管指定`device_count = {'CPU':1,'GPU':0}`,但由於GPU上的內存不足錯誤,爲什麼TensorFlow會話無法啓動?

import tensorflow as tf 

a = tf.constant(1, name = 'a') 
b = tf.constant(3, name = 'b') 
c = tf.constant(9, name = 'c') 
d = tf.add(a, b, name='d') 
e = tf.add(d, c, name='e') 

session_conf = tf.ConfigProto(
      device_count={'CPU': 1, 'GPU': 0}, 
      allow_soft_placement=True 
     ) 
sess = tf.Session(config=session_conf) 
print(sess.run([d, e])) 

它給了我CUDA_ERROR_OUT_OF_MEMORY錯誤停止程序的執行:

[email protected]:/scratch/test$ python3.5 shape.py 
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcublas.so.8.0 locally 
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcudnn.so.5 locally 
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcufft.so.8.0 locally 
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcuda.so.1 locally 
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcurand.so.8.0 locally 
E tensorflow/core/common_runtime/direct_session.cc:137] Internal: failed initializing StreamExecutor for CUDA device ordinal 0: Internal: failed call to cuDevicePrimaryCtxRetain: CUDA_ERROR_OUT_OF_MEMORY; total memory reported: 18446744073709551615 
Traceback (most recent call last): 
    File "shape.py", line 20, in <module> 
    sess = tf.Session(config=session_conf) 
    File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1187, in __init__ 
    super(Session, self).__init__(target, graph, config=config) 
    File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 552, in __init__ 
    self._session = tf_session.TF_NewDeprecatedSession(opts, status) 
    File "/usr/lib/python3.5/contextlib.py", line 66, in __exit__ 
    next(self.gen) 
    File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/errors_impl.py", line 469, in raise_exception_on_not_ok_status 
    pywrap_tensorflow.TF_GetCode(status)) 
tensorflow.python.framework.errors_impl.InternalError: Failed to create session. 

爲什麼能水平假設我在創建TensorFlow會話時指定了device_count={'CPU': 1, 'GPU': 0}, allow_soft_placement=True,vRAM使用會干擾我的程序?

回答

2

我不確定device_count={'GPU': 0}是否可以防止GPU內存分配,我以前沒有看到過。有可能它不起作用,因爲GPU分配器是一個過程級概念,因爲它在會話之間共享。所以你試圖通過會話級配置來配置進程級別的設置。最確定的方法是通過設置env var-export CUDA_VISIBLE_DEVICES=

+0

使進程級別的GPU不可見請注意,在windows'export CUDA_VISIBLE_DEVICES ='不會工作(因爲我發現了困難的方法[在這裏](https: //stackoverflow.com/questions/44500733/tensorflow-allocating-gpu-memory-when-using-tf-device-cpu0/44513295?noredirect=1#comment76027592_44513295))。爲了有效地屏蔽所有的GPU,你必須設置'CUDA_VISIBLE_DEVICES = -1'(或者任何其他無效的設備編號)。 – GPhilo

+0

它適用於我,而且它的使用非常廣泛,在你的情況下,你的CUDA驅動程序必須是特殊的。 –

+0

您正在使用哪種CUDA SDK?我使用的是版本8,在他們的文檔中他們沒有指定空字符串的行爲 – GPhilo

相關問題