2016-11-22 77 views
0

在下面的代碼中,我試圖分叉運行一個命令的過程,然後在子子進程退出時捕獲結果。什麼時候subprocess.Popen在Python中零返回?

最後,在全局變量上運行一個循環以等待子進程結束,以便父進程在子進程之前不會退出,但命令的全局運行是非阻塞的。代碼工作正常9次10次,但偶爾發生錯誤。

錯誤是在似乎subprocess.Popen返回None的情況下。但我不確定爲什麼會隨機發生。

請問有人可以幫忙弄清楚這裏出了什麼問題嗎?

機細節

[[email protected] /]# uname -a 
Linux 1-0-0-9 3.10.0-229.el7.x86_64 #1 SMP Thu Jan 29 18:37:38 EST 2015 x86_64 x86_64 x86_64 GNU/Linux 

代碼:

#!/usr/bin/env python 

import os 
import subprocess 
import signal 
import time 

flag = False 
class Utils(object): 

    def __init__(self): 
     self.child_pid = None 
     signal.signal(signal.SIGCHLD, self.sigchld_handler) 

    def sigchld_handler(self, *args): 
     print "handling SIGCHLD" 
     p = self.child_pid 

     stdout_val = p.communicate()[0] 
     retcode = p.returncode 
     print p.returncode, stdout_val.strip() 
     self.child_pid = None 
     global flag 
     flag = False 


    def run_command(self, cmnd, env=None, cwd=None, timeout=0): 
     global flag 
     flag = True 
     cmnd = cmnd.split() 
     self.child_pid =subprocess.Popen(cmnd, stdin=None, bufsize=-1, env=env, 
          stdout=subprocess.PIPE, stderr=subprocess.STDOUT, 
          close_fds=True, cwd=cwd, preexec_fn=os.setsid) 
     print "Invoked child process " , self.child_pid.pid 

print "Running command .." 
Utils().run_command("ls -lrt") 
for i in xrange(10000): 
    if not i % 1000: 
     print i 
print flag 
i = 0 
while flag: 
    i = i + 1 

正確的(期望)輸出:

Running command .. 
Invoked child process 9703 
0 
1000 
2000 
3000 
4000 
5000 
handling SIGCHLD 
0 total 52 
drwxr-xr-x. 2 root root 6 Mar 13 2014 srv 
drwxr-xr-x. 2 root root 6 Mar 13 2014 mnt 
drwxr-xr-x. 2 root root 6 Mar 13 2014 media 
drwxr-xr-x. 2 root root 6 Mar 13 2014 home 
lrwxrwxrwx. 1 root root 7 Jan 9 2016 bin -> usr/bin 
lrwxrwxrwx. 1 root root 9 Jan 9 2016 lib64 -> usr/lib64 
lrwxrwxrwx. 1 root root 7 Jan 9 2016 lib -> usr/lib 
lrwxrwxrwx. 1 root root 8 Jan 9 2016 sbin -> usr/sbin 
drwxr-xr-x. 13 root root 4096 Jan 9 2016 usr 
drwxr-xr-x. 4 root root 28 Nov 18 16:03 opt 
dr-xr-xr-x. 4 root root 4096 Nov 18 16:06 boot 
dr-xr-xr-x 178 root root 0 Nov 22 21:53 proc 
dr-xr-xr-x 13 root root 0 Nov 22 21:53 sys 
drwxr-xr-x. 22 root root 4096 Nov 22 21:53 var 
drwxr-xr-x 19 root root 3060 Nov 22 21:53 dev 
drwxr-xr-x. 124 root root 8192 Nov 22 21:53 etc 
dr-xr-x---. 8 root root 4096 Nov 22 21:53 root 
-rw-r--r-- 1 root root 573 Nov 22 22:15 a.py 
-rw-r--r-- 1 root root 1108 Nov 22 22:15 cmnd.py 
-rw-r--r-- 1 root root 1800 Nov 22 22:15 fork.py 
-rw-r--r-- 1 root root 1368 Nov 22 22:15 ipc_pipe.py 
-rw-r--r-- 1 root root 491 Nov 22 22:15 threads.py 
drwxr-xr-x 35 root root 1000 Nov 22 22:35 run 
drwxrwxrwt. 8 root root 4096 Nov 22 22:35 tmp 
6000 
7000 
8000 
9000 
False 

錯誤(失敗的情況下):

Running command .. 
handling SIGCHLD 
handling SIGCHLD 
handling SIGCHLD 
Traceback (most recent call last): 
    File "cmnd.py", line 37, in <module> 
    Utils().run_command("ls -lrt") 
    File "cmnd.py", line 33, in run_command 
    close_fds=True, cwd=cwd, preexec_fn=os.setsid) 
    File "/usr/lib64/python2.7/subprocess.py", line 711, in __init__ 
    errread, errwrite) 
    File "/usr/lib64/python2.7/subprocess.py", line 1296, in _execute_child 
    data = _eintr_retry_call(os.read, errpipe_read, 1048576) 
    File "/usr/lib64/python2.7/subprocess.py", line 478, in _eintr_retry_call 
    return func(*args) 
    File "cmnd.py", line 19, in sigchld_handler 
    stdout_val = p.communicate()[0] 
AttributeError: 'NoneType' object has no attribute 'communicate' 
+0

有趣的是,「處理SIGCHLD」得到了幾次印刷。只有一個進程分叉,爲什麼父進程會收到SIGCHLD 3次? – ViFI

+2

不確定,但我認爲你已經遇到了一個競爭條件,在你將'Popen'對象賦值給'self.child_pid'之前,你生成了子進程。我建議你在孩子完成時找到一種不同的方式來做某些事情,例如等待其輸出,也許在另一個線程中。 – 2rs2ts

回答

4

我能夠複製NoneType錯誤,這顯然是一個競爭條件。爲了證明,我導入了traceback並將print traceback.print_stack(args[1])添加到信號處理程序。堆棧跟蹤顯示當信號到達並且尚未分配self.child_pid時,Popen仍在等待os.fdopen

Running command .. 
handling SIGCHLD 
    File "c.py", line 39, in <module> 
    Utils().run_command("ls -lrt") 
    File "c.py", line 35, in run_command 
    close_fds=True, cwd=cwd, preexec_fn=os.setsid) 
    File "/usr/lib/python2.7/subprocess.py", line 740, in __init__ 
    self.stdout = os.fdopen(c2pread, 'rb', bufsize) 
None 
Traceback (most recent call last): 
    File "c.py", line 39, in <module> 
    Utils().run_command("ls -lrt") 
    File "c.py", line 35, in run_command 
    close_fds=True, cwd=cwd, preexec_fn=os.setsid) 
    File "/usr/lib/python2.7/subprocess.py", line 740, in __init__ 
    self.stdout = os.fdopen(c2pread, 'rb', bufsize) 
    File "c.py", line 21, in sigchld_handler 
    stdout_val = p.communicate()[0] 
AttributeError: 'NoneType' object has no attribute 'communicate' 

沒有好的辦法來解決這個問題,我可以想到的信號。但是,您的代碼還存在其他問題,例如,如果子進程填充了子進程stdoutstderr,則可能發生死鎖。您可以使用後臺線程調用Popen.communicate並使用pollwait方法來查看過程是否完成,而不是使用信號。

#!/usr/bin/env python 

import os 
import subprocess 
import time 
import threading 

flag = False 
class Utils(object): 

    def __init__(self): 
     self.child = None 
     self._thread = None 

    def run_command(self, cmnd, env=None, cwd=None, timeout=0): 
     global flag 
     flag = True 
     cmnd = cmnd.split() 
     self.child = subprocess.Popen(cmnd, stdin=None, bufsize=-1, env=env, 
          stdout=subprocess.PIPE, stderr=subprocess.STDOUT, 
          close_fds=True, cwd=cwd, preexec_fn=os.setsid) 
     self._thread = threading.Thread(target=self._communicate_thread) 
     self._thread.start() 
     print "Invoked child process " , self.child.pid 
     return self 

    def _communicate_thread(self): 
     self.out, self.err = self.child.communicate() 

    def poll(self): 
     return self.child.poll() 

    def wait(self): 
     rc = self.child.wait() 
     if self._thread: 
      self._thread.join() 
      self._thread = None 
     return rc 

print "Running command .." 
cmd = Utils().run_command("ls -lrt") 
while True: 
    print 'poll', cmd.poll() 
    if cmd.poll() is not None: 
     break 
    else: 
     time.sleep(.1) 

print 'done', cmd.wait() 
print cmd.out 
print cmd.err 
相關問題