2016-12-14 71 views
1

我解析出的過程中/proc/PID/stat。該文件具有的輸入:Python的正則表達式捕獲多組N次

25473 (firefox) S 25468 25465 25465 0 -1 4194304 149151169 108282 32 15 2791321 436115 846 86 20 0 84 0 9648305 2937786368 209665 18446744073709551615 93875088982016 93875089099888 140722931705632 140722931699424 140660842079373 0 0 4102 33572009 0 0 0 17 1 0 0 175 0 0 93875089107104 93875089109128 93875116752896 140722931707410 140722931707418 140722931707418 140722931707879 0 

我想出了:

import re 

def get_stats(pid): 
    with open('/proc/{}/stat'.format(pid)) as fh: 
     stats_raw = fh.read() 
    stat_pattern = '(\d+\s)(\(.+\)\s)(\w+\s)(-?\d+\s?)' 
    return re.findall(stat_pattern, stats_raw) 

這將匹配前三組,但只返回一個字段中最後一組的(-?\d+\s?)

[('25473 ', '(firefox) ', 'S ', '25468 ')] 

我一直在尋找一種方式來進行小組賽最後一場只設置數量:

'(\d+\s)(\(.+\)\s)(\w+\s)(-?\d+\s?){49}' 
+0

你可以使用正則表達式的PyPI模塊?然後你可以使用你的方法。否則,你需要兩步。 –

+0

@WiktorStribiżew好知道該模塊但這是另一個模塊的一部分,它不會是理想的添加其他的依賴。儘管如果有人遇到這種情況,在回答中顯示差異並不是一個壞主意。 – tijko

+1

好吧,然後用''(\ d + \ s)(\(。+ \)\ s)(\ w + \ s)((?: - ?\ d + \ s?){49})''比賽,第四組用空格分開。 –

回答

1

你不能用正則表達式re訪問每個重複採集。您可以捕捉字符串的所有的剩餘分成4組,然後用空格分開:

import re 
s = r'25473 (firefox) S 25468 25465 25465 0 -1 4194304 149151169 108282 32 15 2791321 436115 846 86 20 0 84 0 9648305 2937786368 209665 18446744073709551615 93875088982016 93875089099888 140722931705632 140722931699424 140660842079373 0 0 4102 33572009 0 0 0 17 1 0 0 175 0 0 93875089107104 93875089109128 93875116752896 140722931707410 140722931707418 140722931707418 140722931707879 0' 
stat_pattern = r'(\d+)\s+(\([^)]+\))\s+(\w+)\s*(.*)' 
res = [] 
for m in re.finditer(stat_pattern, s): 
    res.append(m.group(1)) 
    res.append(m.group(2)) 
    res.append(m.group(3)) 
    res.extend(m.group(4).split()) 
print(res) 

輸出:

['25473', '(firefox)', 'S', '25468', '25465', '25465', '0', '-1', '4194304', '149151169', '108282', '32', '15', '2791321', '436115', '846', '86', '20', '0', '84', '0', '9648305', '2937786368', '209665', '18446744073709551615', '93875088982016', '93875089099888', '140722931705632', '140722931699424', '140660842079373', '0', '0', '4102', '33572009', '0', '0', '0', '17', '1', '0', '0', '175', '0', '0', '93875089107104', '93875089109128', '93875116752896', '140722931707410', '140722931707418', '140722931707418', '140722931707879', '0'] 

如果你從字面上只需要得到49號到4組,使用

r'(\d+)\s+(\([^)]+\))\s+(\w+)\s*((?:-?\d+\s?){49})' 
           ^^^^^^^^^^^^^^^^^^ 

隨着PyPi regex module,你可以使用r'(?P<o>\d+)\s+(?P<o>\([^)]+\))\s+(?P<o>\w+)\s+(?P<o>-?\d+\s?){49}'和運行regex.search(pattern, s)訪問.captures("o")棧與您需要的值之後。

>>> import regex 
>>> s = '25473 (firefox) S 25468 25465 25465 0 -1 4194304 149151169 108282 32 15 2791321 436115 846 86 20 0 84 0 9648305 2937786368 209665 18446744073709551615 93875088982016 93875089099888 140722931705632 140722931699424 140660842079373 0 0 4102 33572009 0 0 0 17 1 0 0 175 0 0 93875089107104 93875089109128 93875116752896 140722931707410 140722931707418 140722931707418 140722931707879 0' 
>>> stat_pattern = r'(?P<o>\d+)\s+(?P<o>\([^)]+\))\s+(?P<o>\w+)\s+(?P<o>-?\d+\s?){49}' 
>>> m = regex.search(stat_pattern, s) 
>>> if m: 
    print(m.captures("o")) 

輸出:

['25473', '(firefox)', 'S', '25468 ', '25465 ', '25465 ', '0 ', '-1 ', '4194304 ', '149151169 ', '108282 ', '32 ', '15 ', '2791321 ', '436115 ', '846 ', '86 ', '20 ', '0 ', '84 ', '0 ', '9648305 ', '2937786368 ', '209665 ', '18446744073709551615 ', '93875088982016 ', '93875089099888 ', '140722931705632 ', '140722931699424 ', '140660842079373 ', '0 ', '0 ', '4102 ', '33572009 ', '0 ', '0 ', '0 ', '17 ', '1 ', '0 ', '0 ', '175 ', '0 ', '0 ', '93875089107104 ', '93875089109128 ', '93875116752896 ', '140722931707410 ', '140722931707418 ', '140722931707418 ', '140722931707879 ', '0']