2012-04-19 133 views
2

我正在嘗試解析由gcc生成的映射文件,以獲取函數地址。有一個可能的solution here(python),但它不適合我。瞭解正則表達式

我想了解所提供的解決方案。它有兩個複雜的正則表達式..

m = re.search('^\[([0-9 ]+)\]\s+(.+)\s*$',line) 
m = re.search('^([0-9A-Fx]+)\s+([0-9A-Fx]+)\s+(\[([ 0-9]+)\]|\w+)\s+(.*?)\s*$', line) 

任何人都可以解釋我什麼是RE搜索?

是否有任何其他工作液從GCC產生的映射文件獲取函數的地址?

回答

9
^\[([0-9 ]+)\]\s+(.+)\s*$ 

^     start of the line 
\[     literal [ 
([0-9 ]+)   group of 0-9 or space, one or more times 
\]     literal ] 
\s+    one or more spaces 
(.+)    group of anything one or moretimes 
\s*    zero or more spaces 
$     end of line 


eg: "[5 5 5] blah" 

gives: 
    group1 = "5 5 5" 
    group2 = blah 

^([0-9A-Fx]+)\s+([0-9A-Fx]+)\s+(\[([ 0-9]+)\]|\w+)\s+(.*?)\s*$ 

^     start of line 
([0-9A-Fx]+)  group of chars one or more times 
\s+    one or more spaces 
([0-9A-Fx]+)  group of chars one or more times 
\s+    one or more spaces 
(
    \[    literal [ 
    ([ 0-9]+)  group of char 1 or more times 
    \]    literal [ 
    |    or 
    \w+   word char, one or more times 
) 
\s+    one or more spaces 
(.*?)    any char zero or more times, non greedy 
\s*    zero or more spaces 
$     end of line 
+2

(應當指出的是,'\ S *'在最後總是會匹配任何操作,因爲在'+'在'(+)'是貪婪的。) – huon 2012-04-19 11:56:36

6

調試Python正則表達式的一種方法是在創建模式對象時使用未記錄的re.DEBUG標誌。

>>> import re 
>>> re.compile('^\[([0-9 ]+)\]\s+(.+)\s*$', re.DEBUG) 
at at_beginning 
literal 91 
subpattern 1 
    max_repeat 1 65535 
    in 
     range (48, 57) 
     literal 32 
literal 93 
max_repeat 1 65535 
    in 
    category category_space 
subpattern 2 
    max_repeat 1 65535 
    any None 
max_repeat 0 65535 
    in 
    category category_space 
at at_end 
<_sre.SRE_Pattern object at 0x01CE8950> 

這顯然不是100%直截了當地閱讀,但它可以幫助,如果你知道一些關於如何匹配工作,並找到壓痕很有幫助。

+1

+1 - 這是一個方便的技巧。 – 2012-04-19 12:01:02

1
pattern1 = re.compile (
r""" 
^      # start of string 
\[      # literal [ 
([0-9 ]+)    # Collection of numbers and spaces 
\]      # literal ] 
\s+      # whitespace 
(.+)     # any string of at least one character 
\s*      # possible whitespace 
$      # end of string 
""", re.VERBOSE) 

pattern2 = re.compile (
r""" 
^      # Start of string 
([0-9A-Fx]+)   # Collection of hexadecimal digits or 'x' 
\s+      # Whitespace 
([0-9A-Fx]+)   # Collection of hexadecimal digits or 'x' 
\s+      # Whitespace 
(\[([ 0-9]+)\]|\w+)  # A collection of numbers, or space, inside [] brackets 
\s+      # Whitespace 
(.*?)     # Any string 
\s*      # Possible whitespace 
$      # End of string 
""", re.VERBOSE) 

這些實際上是寫得很差的正則表達式。

我敢打賭,([0-9A-Fx]+)子組實際上是爲了匹配十六進制數字,如0x1234DEADBEEF。然而,他們寫作的方式也可以匹配xxxxxxxxxx等荒謬之處。 0x[0-9A-F]+在這裏會更合適。

還有在第二個正則表達式中使用非貪婪匹配(.*?),無論如何它將被強制爲貪婪,因爲正則表達式必須匹配整行。

0

第一個是:

^   start of string 
\[  a '[' 
([0-9 ]+) one or more digits and spaces 
\]  a ']' 
\s+  whitespace 
(.+)  anything 
\s*  optional whitespace 
$   end of string 

實例:

"[12345] Hello" 
"[06 7] \t Foo.Bar! " 

第二個是:

^   start of string 
([0-9A-Fx]+) hex digits and x 
\s+   whitespace 
([0-9A-Fx]+) hex digits and x 
\s+   whitespace 
(   either: 
\[    a '[' 
([ 0-9]+)  digits and spaces 
\]    a ']' 
|   or: 
\w+   a word 
)   end group 
\s+   whitespace 
(.*?)  optional anything (non-greedy) 
\s*   optional whitespace 
$   end string 

實例:

"0xF00 0x1234 [89] Foo" 
"78x9 023 Foobar " 
0

讓我給你一個寶貴的鏈接來找出這些正則表達式。

Click on this

你先正則表達式會被解析並解釋爲:

NODE      EXPLANATION 
-------------------------------------------------------------------------------- 
^      the beginning of the string 
-------------------------------------------------------------------------------- 
    \[      '[' 
-------------------------------------------------------------------------------- 
    (      group and capture to \1: 
-------------------------------------------------------------------------------- 
    [0-9 ]+     any character of: '0' to '9', ' ' (1 or 
          more times (matching the most amount 
          possible)) 
-------------------------------------------------------------------------------- 
)      end of \1 
-------------------------------------------------------------------------------- 
    \]      ']' 
-------------------------------------------------------------------------------- 
    \s+      whitespace (\n, \r, \t, \f, and " ") (1 or 
          more times (matching the most amount 
          possible)) 
-------------------------------------------------------------------------------- 
    (      group and capture to \2: 
-------------------------------------------------------------------------------- 
    .+      any character except \n (1 or more times 
          (matching the most amount possible)) 
-------------------------------------------------------------------------------- 
)      end of \2 
-------------------------------------------------------------------------------- 
    \s*      whitespace (\n, \r, \t, \f, and " ") (0 or 
          more times (matching the most amount 
          possible)) 
-------------------------------------------------------------------------------- 
    $      before an optional \n, and the end of the 
          string 

我相信你能弄清楚如何獲得第二個解析。

乾杯。