2011-09-28 93 views
0

可以說我有一個正則表達式:回溯正則表達式

match = re.search(pattern, content) 
if not match: 
    raise Exception, 'regex traceback' # i want to throw here the regex matching process. 

如果正則表達式fails to match然後我想在exception扔它的工作,並在那裏沒有正則表達式模式匹配,在哪個階段等。是否有可能實現所需的功能?

+0

它看起來你有什麼工作。你測試過了嗎? –

+1

看看[獲取python正則表達式解析樹來調試您的正則表達式](http://stackoverflow.com/questions/101268/hidden-features-of-python/143636#143636) – agf

回答

0

我有事情,可以幫助我我的代碼中調試複雜的正則表達式模式。
這對你有幫助嗎? :

import re 

li = ('ksjdhfqsd\n' 
     '5 12478 abdefgcd ocean__12  ty--\t\t ghtr789\n' 
     'qfgqrgqrg', 

     '6 48788 bcfgdebc atlantic__7899 %fg#\t\t ghtu12340\n', 

     '2 47890 bbcedefg arctic__124 **juyf\t\t ghtr89877', 

     '9 54879 bbdecddf antarctic__13 18:13pomodoro\t\t ghtr6798', 


     'ksjdhfqsd\n' 
     '5 12478 abdefgcd ocean__1247101247887 ty--\t\t ghtr789\n' 
     'qfgqrgqrg', 

     '6 48788 bcfgdebc atlantic__7899 %fg#\t\t ghtu12940\n', 

     '25 47890 bbcedefg arctic__124 **juyf\t\t ghtr89877', 

     '9 54879 bbdeYddf antarctic__13 18:13pomodoro\t\t ghtr6798') 


tupleRE = ('^\d', 
      ' ', 
      '\d{5}', 
      ' ', 
      '[abcdefghi]+', 
      ' ', 
      '(?=[a-z\d_ ]{14} [^ ]+\t\t ght)', 
      '[a-z]+', 
      '__', 
      '[\d]+', 
      ' +', 
      '[^\t]+', 
      '\t\t', 
      ' ', 
      'ght', 
      '(r[5-9]+|u[0-4]+)', 
      '$') 



def REtest(ch, tuplRE, flags = re.MULTILINE): 
    for n in xrange(len(tupleRE)): 
     regx = re.compile(''.join(tupleRE[:n+1]), flags) 
     testmatch = regx.search(ch) 
     if not testmatch: 
      print '\n -*- tupleRE :\n' 
      print '\n'.join(str(i).zfill(2)+' '+repr(u) 
          for i,u in enumerate(tupleRE[:n])) 
      print ' --------------------------------' 
      # tupleRE doesn't works because of element n 
      print str(n).zfill(2)+' '+repr(tupleRE[n])\ 
        +" doesn't match anymore from this ligne "\ 
        +str(n)+' of tupleRE' 
      print '\n'.join(str(n+1+j).zfill(2)+' '+repr(u) 
          for j,u in enumerate(tupleRE[n+1: 
                 min(n+2,len(tupleRE))])) 

      for i in xrange(n): 
       match = re.search(''.join(tupleRE[:n-i]),ch, flags) 
       if match: 
        break 

      matching_portion = match.group() 
      matching_li = '\n'.join(map(repr, 
             matching_portion.splitlines(True)[-5:])) 
      fin_matching_portion = match.end() 
      print ('\n\n -*- Part of the tested string which is concerned :\n\n' 
        '######### matching_portion ########\n'+matching_li + '\n' 
        '##### end of matching_portion #####\n' 
        '-----------------------------------\n' 
        '######## unmatching_portion #######') 
      print '\n'.join(map(repr, 
           ch[fin_matching_portion: 
            fin_matching_portion+300].splitlines(True))) 
      break 
    else: 
     print '\n SUCCES . The regex integrally matches.' 



for x in li: 
    print ' -*- Analyzed string :\n%r' % x 
    REtest(x,tupleRE) 
    print '\nmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwm' 

結果

-*- Analyzed string : 
'ksjdhfqsd\n5 12478 abdefgcd ocean__12  ty--\t\t ghtr789\nqfgqrgqrg' 

    SUCCESS . The regex integrally matches. 

mwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwm 
    -*- Analyzed string : 
'6 48788 bcfgdebc atlantic__7899 %fg#\t\t ghtu12340\n' 

    SUCCESS . The regex integrally matches. 

mwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwm 
    -*- Analyzed string : 
'2 47890 bbcedefg arctic__124 **juyf\t\t ghtr89877' 

    SUCCESS . The regex integrally matches. 

mwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwm 
    -*- Analyzed string : 
'9 54879 bbdecddf antarctic__13 18:13pomodoro\t\t ghtr6798' 

    SUCCESS . The regex integrally matches. 

mwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwm 
    -*- Analyzed string : 
'ksjdhfqsd\n5 12478 abdefgcd ocean__1247101247887 ty--\t\t ghtr789\nqfgqrgqrg' 

    -*- tupleRE : 

00 '^\\d' 
01 ' ' 
02 '\\d{5}' 
03 ' ' 
04 '[abcdefghi]+' 
05 ' ' 
    -------------------------------- 
06 '(?=[a-z\\d_ ]{14} [^ ]+\t\t ght)' doesn't match anymore from this ligne 6 of tupleRE 
07 '[a-z]+' 


    -*- Part of the tested string which is concerned : 

######### matching_portion ######## 
'5 12478 abdefgcd ' 
##### end of matching_portion ##### 
----------------------------------- 
######## unmatching_portion ####### 
'ocean__1247101247887 ty--\t\t ghtr789\n' 
'qfgqrgqrg' 

mwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwm 
    -*- Analyzed string : 
'6 48788 bcfgdebc atlantic__7899 %fg#\t\t ghtu12940\n' 

    -*- tupleRE : 

00 '^\\d' 
01 ' ' 
02 '\\d{5}' 
03 ' ' 
04 '[abcdefghi]+' 
05 ' ' 
06 '(?=[a-z\\d_ ]{14} [^ ]+\t\t ght)' 
07 '[a-z]+' 
08 '__' 
09 '[\\d]+' 
10 ' +' 
11 '[^\t]+' 
12 '\t\t' 
13 ' ' 
14 'ght' 
15 '(r[5-9]+|u[0-4]+)' 
    -------------------------------- 
16 '$' doesn't match anymore from this ligne 16 of tupleRE 



    -*- Part of the tested string which is concerned : 

######### matching_portion ######## 
'6 48788 bcfgdebc atlantic__7899 %fg#\t\t ghtu12' 
##### end of matching_portion ##### 
----------------------------------- 
######## unmatching_portion ####### 
'940\n' 

mwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwm 
    -*- Analyzed string : 
'25 47890 bbcedefg arctic__124 **juyf\t\t ghtr89877' 

    -*- tupleRE : 

00 '^\\d' 
    -------------------------------- 
01 ' ' doesn't match anymore from this ligne 1 of tupleRE 
02 '\\d{5}' 


    -*- Part of the tested string which is concerned : 

######### matching_portion ######## 
'2' 
##### end of matching_portion ##### 
----------------------------------- 
######## unmatching_portion ####### 
'5 47890 bbcedefg arctic__124 **juyf\t\t ghtr89877' 

mwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwm 
    -*- Analyzed string : 
'9 54879 bbdeYddf antarctic__13 18:13pomodoro\t\t ghtr6798' 

    -*- tupleRE : 

00 '^\\d' 
01 ' ' 
02 '\\d{5}' 
03 ' ' 
04 '[abcdefghi]+' 
    -------------------------------- 
05 ' ' doesn't match anymore from this ligne 5 of tupleRE 
06 '(?=[a-z\\d_ ]{14} [^ ]+\t\t ght)' 


    -*- Part of the tested string which is concerned : 

######### matching_portion ######## 
'9 54879 bbde' 
##### end of matching_portion ##### 
----------------------------------- 
######## unmatching_portion ####### 
'Yddf antarctic__13 18:13pomodoro\t\t ghtr6798' 

mwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwm 
+0

是我已經使用它,並找到有幫助,但它有點複雜,但:p –

0

如果您需要測試re,您可以使用羣組,然後* ... as(sometext)* 與您所需的正則表達式一起使用,然後您應該能夠拔出失敗位置

,然後利用以下,作爲

POS 被傳遞到搜索()或RegexObject的匹配()方法中的POS的值上python.org說明。這是RE引擎開始尋找匹配的字符串的索引。

endpos 傳遞給> RegexObject的search()或match()方法的endpos的值。這是RE引擎不會去的字符串的索引。

lastindex 最後一個匹配的捕獲組的整數索引,或者如果沒有組完全匹配,則返回None。例如,如果將表達式(a)b,((a)(b))和((ab))應用於字符串「ab」,則lastindex == 1,而表達式(a)(b)將如果應用於相同的字符串,則lastindex == 2。

lastgroup 上次匹配的捕獲組的名稱,或者如果該組沒有名稱,或者根本沒有組匹配,則爲None。

re match()或search()方法生成此MatchObject實例的正則表達式對象。

字符串 傳遞給match()或search()的字符串。

所以一個很簡單的例子

>>> m1 = re.compile(r'the real thing') 
>>> m2 = re.compile(r'(the)* (real)* (thing)*') 
>>> if not m1.search(mytextvar): 
>>>  res = m2.search(mytextvar) 
>>>  print res.lastgroup 
>>>  #raise my exception