2016-09-22 117 views
0

這是我的全部跟蹤:redis.exceptions.ConnectionError後大約一天芹菜運行

Traceback (most recent call last): 
    File "/home/server/backend/venv/lib/python3.4/site-packages/celery/app/trace.py", line 283, in trace_task 
    uuid, retval, SUCCESS, request=task_request, 
    File "/home/server/backend/venv/lib/python3.4/site-packages/celery/backends/base.py", line 256, in store_result 
    request=request, **kwargs) 
    File "/home/server/backend/venv/lib/python3.4/site-packages/celery/backends/base.py", line 490, in _store_result 
    self.set(self.get_key_for_task(task_id), self.encode(meta)) 
    File "/home/server/backend/venv/lib/python3.4/site-packages/celery/backends/redis.py", line 160, in set 
    return self.ensure(self._set, (key, value), **retry_policy) 
    File "/home/server/backend/venv/lib/python3.4/site-packages/celery/backends/redis.py", line 149, in ensure 
    **retry_policy 
    File "/home/server/backend/venv/lib/python3.4/site-packages/kombu/utils/__init__.py", line 243, in retry_over_time 
    return fun(*args, **kwargs) 
    File "/home/server/backend/venv/lib/python3.4/site-packages/celery/backends/redis.py", line 169, in _set 
    pipe.execute() 
    File "/home/server/backend/venv/lib/python3.4/site-packages/redis/client.py", line 2593, in execute 
    return execute(conn, stack, raise_on_error) 
    File "/home/server/backend/venv/lib/python3.4/site-packages/redis/client.py", line 2447, in _execute_transaction 
    connection.send_packed_command(all_cmds) 
    File "/home/server/backend/venv/lib/python3.4/site-packages/redis/connection.py", line 532, in send_packed_command 
    self.connect() 
    File "/home/pserver/backend/venv/lib/python3.4/site-packages/redis/connection.py", line 436, in connect 
    raise ConnectionError(self._error_message(e)) 
redis.exceptions.ConnectionError: Error 0 connecting to localhost:6379. Error. 
[2016-09-21 10:47:18,814: WARNING/Worker-747] Data collector is not contactable. This can be because of a network issue or because of the data collector being restarted. In the event that contact cannot be made after a period of time then please report this problem to New Relic support for further investigation. The error raised was ConnectionError(ProtocolError('Connection aborted.', BlockingIOError(11, 'Resource temporarily unavailable')),). 

我真的搜索ConnectionError但與我的沒有匹配的問題。

我的平臺是Ubuntu 14.04。這是我的redis配置的一部分。 (如果您需要整個redis.conf文件我可以共享所有的參數都在限制部分封閉的方式。)

# By default Redis listens for connections from all the network interfaces 
# available on the server. It is possible to listen to just one or multiple 
# interfaces using the "bind" configuration directive, followed by one or 
# more IP addresses. 
# 
# Examples: 
# 
# bind 192.168.1.100 10.0.0.1 
bind 127.0.0.1 

# Specify the path for the unix socket that will be used to listen for 
# incoming connections. There is no default, so Redis will not listen 
# on a unix socket when not specified. 
# 
# unixsocket /var/run/redis/redis.sock 
# unixsocketperm 755 

# Close the connection after a client is idle for N seconds (0 to disable) 
timeout 0 

# TCP keepalive. 
# 
# If non-zero, use SO_KEEPALIVE to send TCP ACKs to clients in absence 
# of communication. This is useful for two reasons: 
# 
# 1) Detect dead peers. 
# 2) Take the connection alive from the point of view of network 
# equipment in the middle. 
# 
# On Linux, the specified value (in seconds) is the period used to send ACKs. 
# Note that to close the connection the double of the time is needed. 
# On other kernels the period depends on the kernel configuration. 
# 
# A reasonable value for this option is 60 seconds. 
tcp-keepalive 60 

這是我的小Redis的包裝:

import redis 

from django.conf import settings 


REDIS_POOL = redis.ConnectionPool(host=settings.REDIS_HOST, port=settings.REDIS_PORT) 


def get_redis_server(): 
    return redis.Redis(connection_pool=REDIS_POOL) 

這是我如何使用它:

from redis_wrapper import get_redis_server 

# view and task are working in different, indipendent processes 

def sample_view(request): 
    rs = get_redis_server() 
    # some get-set stuff with redis 



@shared_task 
def sample_celery_task(): 
    rs = get_redis_server() 
    # some get-set stuff with redis 

包版本:

celery==3.1.18 
django-celery==3.1.16 
kombu==3.0.26 
redis==2.10.3 

所以問題在於;這個連接錯誤發生在一段時間後,啓動芹菜工人。在看到這個錯誤後,所有的任務都以這個錯誤結束,直到我重新啓動了所有的芹菜工作者。 (有趣的是,芹菜花也在這個問題期間失敗)

我懷疑我的redis連接池使用方法,或redis配置或不太可能的網絡問題。任何想法的原因?我究竟做錯了什麼?

(PS:我會添加Redis的-CLI信息的結果,當我今天看到這個錯誤)

UPDATE:

我暫時加入--maxtasksperchild參數到我的工作人員啓動命令解決了這個問題。我把它設置爲200.當然這不是解決這個問題的正確方法,它只是一個症狀治療。它基本上定期刷新工作者實例(關閉舊的進程並在舊的進程達到200任務時創建新進程)並刷新我的全局Redis池和連接。 所以我認爲我應該關注全局redis連接池的使用方式,我仍在等待新的想法和意見。

對不起,我的英語不好,並提前致謝。

回答

0

你是否在redis中啓用了rdb後臺保存方法?
如果是這樣,請檢查/var/lib/redisdump.rdb文件的大小。
有時候文件會變大並填充root目錄,並且redis實例無法再保存到該文件。

您可以通過redis-cli

+0

由於發行
config set stop-writes-on-bgsave-error no
命令停止後臺保存過程。我剛剛檢查過,它是589MB。但它是redis將它用於持久性目的的文件,對嗎?我的意思是,如果我禁用它,在機器重新啓動後,我將失去我的隊列,不是嗎?我也在一天內檢查幾次磁盤大小。最後,如果我重新啓動工作人員,是否可以減少轉儲文件的大小?我的意思是,我想,症狀不匹配。 –

+0

提供的命令不是爲了停止後臺保存,而是爲了防止redis在發生關於後臺保存方法的錯誤時停止。 –

+0

我明白了。但是我們必須考慮到,在錯誤時間內,我可以通過redis-cli訪問redis數據。所以Redis並沒有完全停止,只能阻止我的芹菜客戶。 –