2013-04-23 90 views
0

好吧,所以我有一堆C和C++代碼,我需要通過篩選並找到函數defenitions。我不知道函數類型/返回值,我不知道函數defenition或函數調用中的參數數量等。Python - Regexp - 查找函數名稱,但不是函數調用

到目前爲止,我有:

import re, sys 
from os.path import abspath 
from os import walk 

function = 'msg' 
regexp = r"(" + function + ".*[^;]){" 

found = False 
for root, folders, files in walk('C:\\codepath\\'): 
    for filename in files: 
     with open(abspath(root + '/' + filename)) as fh: 
      data = fh.read() 
      result = re.findall(regexp, data) 
      if len(result) > 0: 
       sys.stdout.write('\n Found function "' + config.function + '" in ' + filename + ':\n\t' + str(result)) 
       sys.stdout.flush() 
    break 

然而,這會產生一些不想要的結果。 的正則表達式必須是故障taulrant例如這些組合:

查找中說的所有突變 「味精」 defenition而不是 「MSG()」 呼叫:

void 
shapex_msg (struct shaper *s) 
{ 
    msg (M_INFO, "Output Traffic Shaping initialized at %d bytes per second", 
     s->bytes_per_second); 
} 

void shapex_msg (struct shaper *s) 
{ 
    msg (M_INFO, "Output Traffic Shaping initialized at %d bytes per second", 
     s->bytes_per_second); 
} 

void shapex_msg (struct shaper *s) { 
    msg (M_INFO, "Output Traffic Shaping initialized at %d bytes per second", 
     s->bytes_per_second); 
} 

回答

1

也許類似下面的正則表達式:

def make_regex(name): 
    return re.compile(r'\s*%s\s*\([^;)]*\)\s*\{' % re.escape(name)) 

測試你的例子:

>>> text = ''' 
... void 
... shapex_msg (struct shaper *s) 
... { 
... msg (M_INFO, "Output Traffic Shaping initialized at %d bytes per second", 
...  s->bytes_per_second); 
... } 
... 
... void shapex_msg (struct shaper *s) 
... { 
... msg (M_INFO, "Output Traffic Shaping initialized at %d bytes per second", 
...  s->bytes_per_second); 
... } 
... 
... void shapex_msg (struct shaper *s) { 
... msg (M_INFO, "Output Traffic Shaping initialized at %d bytes per second", 
...  s->bytes_per_second); 
... }''' 
>>> shapex_msg = make_regex_for_function('shapex_msg') 
>>> shapex_msg.findall(text) 
['\nshapex_msg (struct shaper *s)\n{', ' shapex_msg (struct shaper *s)\n{', ' shapex_msg (struct shaper *s) {'] 

它僅適用於多定義:

>>> shapex_msg.findall('''int 
     shapex_msg  (
int a, 
int b 
) 

     {''' 
['\n \tshapex_msg \t(\nint a,\nint b\n) \n\n\t{'] 

雖然,與函數調用:

>>> shapex_msg.findall('shapex_msg(1,2,3);') 
[] 

正如一個音符,你的正則表達式不起作用,因爲.*是貪婪的,因此它不匹配正確的右括號。

+0

您上次的編輯給了我一份工作副本!謝謝! Ineed貪婪的參數搞亂了事情..一直在嘗試這麼多的組合,我無法環繞我的頭..所以謝謝你! – Torxed 2013-04-23 14:55:05

+0

@Toxxed是的,對不起。寫下來的時候我忘了放一個'*':s – Bakuriu 2013-04-23 14:56:03