2017-09-19 211 views
2

使用最新版本的apache airflow。從LocalExecutor開始,在該模式下,一切工作正常,除了Web UI狀態需要使用CeleryExecutor的一些交互。使用Redis安裝和配置Celery執行程序,將Redis配置爲代理程序URL和結果後端。Apache Airflow芹菜Redis解碼錯誤

它出現在第一個工作,直到任務計劃此時它提供了以下錯誤:

File "/bin/airflow", line 28, in <module> 
    args.func(args) 
    File "/usr/lib/python2.7/site-packages/airflow/bin/cli.py", line 882, in scheduler 
    job.run() 
    File "/usr/lib/python2.7/site-packages/airflow/jobs.py", line 201, in run 
    self._execute() 
    File "/usr/lib/python2.7/site-packages/airflow/jobs.py", line 1311, in _execute 
    self._execute_helper(processor_manager) 
    File "/usr/lib/python2.7/site-packages/airflow/jobs.py", line 1444, in _execute_helper 
    self.executor.heartbeat() 
    File "/usr/lib/python2.7/site-packages/airflow/executors/base_executor.py", line 132, in heartbeat 
    self.sync() 
    File "/usr/lib/python2.7/site-packages/airflow/executors/celery_executor.py", line 91, in sync 
    state = async.state 
    File "/usr/lib/python2.7/site-packages/celery/result.py", line 436, in state 
    return self._get_task_meta()['status'] 
    File "/usr/lib/python2.7/site-packages/celery/result.py", line 375, in _get_task_meta 
    return self._maybe_set_cache(self.backend.get_task_meta(self.id)) 
    File "/usr/lib/python2.7/site-packages/celery/backends/base.py", line 352, in get_task_meta 
    meta = self._get_task_meta_for(task_id) 
    File "/usr/lib/python2.7/site-packages/celery/backends/base.py", line 668, in _get_task_meta_for 
    return self.decode_result(meta) 
    File "/usr/lib/python2.7/site-packages/celery/backends/base.py", line 271, in decode_result 
    return self.meta_from_decoded(self.decode(payload)) 
    File "/usr/lib/python2.7/site-packages/celery/backends/base.py", line 278, in decode 
    accept=self.accept) 
    File "/usr/lib/python2.7/site-packages/kombu/serialization.py", line 263, in loads 
    return decode(data) 
    File "/usr/lib64/python2.7/contextlib.py", line 35, in __exit__ 
    self.gen.throw(type, value, traceback) 
    File "/usr/lib/python2.7/site-packages/kombu/serialization.py", line 54, in _reraise_errors 
    reraise(wrapper, wrapper(exc), sys.exc_info()[2]) 
    File "/usr/lib/python2.7/site-packages/kombu/serialization.py", line 50, in _reraise_errors 
    yield 
    File "/usr/lib/python2.7/site-packages/kombu/serialization.py", line 263, in loads 
    return decode(data) 
    File "/usr/lib/python2.7/site-packages/kombu/serialization.py", line 59, in pickle_loads 
    return load(BytesIO(s)) 
kombu.exceptions.DecodeError: invalid load key, '{'. 

似乎是一個鹹菜序列化錯誤,但我不知道如何追蹤原因。有什麼建議麼?

此問題一直影響我使用subdag功能的工作流程,可能問題與此有關。

注:我也使用rabbitMQ進行測試,在那裏有一個不同的問題;客戶端顯示「通過對等方重置連接」並崩潰。 RabbitMQ日誌顯示「客戶端意外關閉TCP連接」。

回答

0

我偶然發現了這個在我們的調度日誌中看到完全一樣的回溯後:

File "/usr/lib/python2.7/site-packages/kombu/serialization.py", line 59, in pickle_loads 
    return load(BytesIO(s)) 
kombu.exceptions.DecodeError: invalid load key, '{'. 

芹菜試圖unpickle的東西,用「{」形跡可疑開始,所以我採取的tcpdump的事實流量並通過網絡用戶界面觸發任務。所得的捕獲包括本次交流幾乎完全一樣的瞬間,上述回溯出現在調度日誌:

05:03:49.145849 IP <scheduler-ip-addr>.ec2.internal.45597 > <redis-ip-addr>.ec2.internal.6379: Flags [P.], seq 658:731, ack 46, win 211, options [nop,nop,TS val 654768546 ecr 4219564282], length 73: RESP "GET" "celery-task-meta-b0d3a29e-ac08-4e77-871e-b4d553502cc2" 
05:03:49.146086 IP <redis-ip-addr>.ec2.internal.6379 > <scheduler-ip-addr>.ec2.internal.45597: Flags [P.], seq 46:177, ack 731, win 210, options [nop,nop,TS val 4219564282 ecr 654768546], length 131: RESP "{"status": "SUCCESS", "traceback": null, "result": null, "task_id": "b0d3a29e-ac08-4e77-871e-b4d553502cc2", "children": []}" 

從Redis的響應顯然是JSON,爲什麼是芹菜試圖unpickle它的有效載荷?我們正在從Airflow 1.7遷移到1.8,在我們的推出期間,我們有一個Airflow工作人員正在運行v1.7,另一個正在運行v1.8。工作人員應該從排隊的工作隊中抽出來,但由於我們的DAG中有一個錯誤,我們有一個由Airflow 1.8安排的TaskInstance,然後由通過Airflow 1.7啓動的芹菜工作人員執行。

AIRFLOW-1038將用於celery任務狀態的序列化程序從JSON(缺省值)更改爲pickle,因此在此更改之前運行代碼版本的工作人員將以JSON序列化結果,以及運行包含此代碼版本的調度程序更改會嘗試通過取消打開來反序列化結果,這會導致上述錯誤。

0

請確認您在airflow.cfg中配置了哪種celery_result_backend。嘗試將其切換到數據庫後端(MySQL等),如果不是這樣的話。

我們看到,使用ampq後端(僅適用於Celery 3.1及更低版本),redis和rpc後端有時會出現問題。