瞭解正則表達式

我正在嘗試解析由gcc生成的映射文件，以獲取函數地址。有一個可能的solution here（python），但它不適合我。瞭解正則表達式

我想了解所提供的解決方案。它有兩個複雜的正則表達式..

m = re.search('^\[([0-9 ]+)\]\s+(.+)\s*$',line) 
m = re.search('^([0-9A-Fx]+)\s+([0-9A-Fx]+)\s+(\[([ 0-9]+)\]|\w+)\s+(.*?)\s*$', line)

任何人都可以解釋我什麼是RE搜索？

是否有任何其他工作液從GCC產生的映射文件獲取函數的地址？

來源

2012-04-19 Kamath

^\[([0-9 ]+)\]\s+(.+)\s*$ 

^     start of the line 
\[     literal [ 
([0-9 ]+)   group of 0-9 or space, one or more times 
\]     literal ] 
\s+    one or more spaces 
(.+)    group of anything one or moretimes 
\s*    zero or more spaces 
$     end of line 


eg: "[5 5 5] blah" 

gives: 
    group1 = "5 5 5" 
    group2 = blah 

^([0-9A-Fx]+)\s+([0-9A-Fx]+)\s+(\[([ 0-9]+)\]|\w+)\s+(.*?)\s*$ 

^     start of line 
([0-9A-Fx]+)  group of chars one or more times 
\s+    one or more spaces 
([0-9A-Fx]+)  group of chars one or more times 
\s+    one or more spaces 
(
    \[    literal [ 
    ([ 0-9]+)  group of char 1 or more times 
    \]    literal [ 
    |    or 
    \w+   word char, one or more times 
) 
\s+    one or more spaces 
(.*?)    any char zero or more times, non greedy 
\s*    zero or more spaces 
$     end of line

來源

2012-04-19 11:54:31

（應當指出的是，'\ S *'在最後總是會匹配任何操作，因爲在'+'在'（+）'是貪婪的。） – huon 2012-04-19 11:56:36

調試Python正則表達式的一種方法是在創建模式對象時使用未記錄的re.DEBUG標誌。

>>> import re 
>>> re.compile('^\[([0-9 ]+)\]\s+(.+)\s*$', re.DEBUG) 
at at_beginning 
literal 91 
subpattern 1 
    max_repeat 1 65535 
    in 
     range (48, 57) 
     literal 32 
literal 93 
max_repeat 1 65535 
    in 
    category category_space 
subpattern 2 
    max_repeat 1 65535 
    any None 
max_repeat 0 65535 
    in 
    category category_space 
at at_end 
<_sre.SRE_Pattern object at 0x01CE8950>

這顯然不是100％直截了當地閱讀，但它可以幫助，如果你知道一些關於如何匹配工作，並找到壓痕很有幫助。

來源

2012-04-19 11:56:25

+1 - 這是一個方便的技巧。 – 2012-04-19 12:01:02

pattern1 = re.compile (
r""" 
^      # start of string 
\[      # literal [ 
([0-9 ]+)    # Collection of numbers and spaces 
\]      # literal ] 
\s+      # whitespace 
(.+)     # any string of at least one character 
\s*      # possible whitespace 
$      # end of string 
""", re.VERBOSE) 

pattern2 = re.compile (
r""" 
^      # Start of string 
([0-9A-Fx]+)   # Collection of hexadecimal digits or 'x' 
\s+      # Whitespace 
([0-9A-Fx]+)   # Collection of hexadecimal digits or 'x' 
\s+      # Whitespace 
(\[([ 0-9]+)\]|\w+)  # A collection of numbers, or space, inside [] brackets 
\s+      # Whitespace 
(.*?)     # Any string 
\s*      # Possible whitespace 
$      # End of string 
""", re.VERBOSE)

這些實際上是寫得很差的正則表達式。

我敢打賭，([0-9A-Fx]+)子組實際上是爲了匹配十六進制數字，如0x1234DEADBEEF。然而，他們寫作的方式也可以匹配xxxxxxxxxx等荒謬之處。 0x[0-9A-F]+在這裏會更合適。

還有在第二個正則表達式中使用非貪婪匹配(.*?)，無論如何它將被強制爲貪婪，因爲正則表達式必須匹配整行。

來源

2012-04-19 11:56:59

第一個是：

^   start of string 
\[  a '[' 
([0-9 ]+) one or more digits and spaces 
\]  a ']' 
\s+  whitespace 
(.+)  anything 
\s*  optional whitespace 
$   end of string

實例：

"[12345] Hello" 
"[06 7] \t Foo.Bar! "

第二個是：

^   start of string 
([0-9A-Fx]+) hex digits and x 
\s+   whitespace 
([0-9A-Fx]+) hex digits and x 
\s+   whitespace 
(   either: 
\[    a '[' 
([ 0-9]+)  digits and spaces 
\]    a ']' 
|   or: 
\w+   a word 
)   end group 
\s+   whitespace 
(.*?)  optional anything (non-greedy) 
\s*   optional whitespace 
$   end string

實例：

"0xF00 0x1234 [89] Foo" 
"78x9 023 Foobar "

來源

2012-04-19 12:04:27

讓我給你一個寶貴的鏈接來找出這些正則表達式。

Click on this

你先正則表達式會被解析並解釋爲：

NODE      EXPLANATION 
-------------------------------------------------------------------------------- 
^      the beginning of the string 
-------------------------------------------------------------------------------- 
    \[      '[' 
-------------------------------------------------------------------------------- 
    (      group and capture to \1: 
-------------------------------------------------------------------------------- 
    [0-9 ]+     any character of: '0' to '9', ' ' (1 or 
          more times (matching the most amount 
          possible)) 
-------------------------------------------------------------------------------- 
)      end of \1 
-------------------------------------------------------------------------------- 
    \]      ']' 
-------------------------------------------------------------------------------- 
    \s+      whitespace (\n, \r, \t, \f, and " ") (1 or 
          more times (matching the most amount 
          possible)) 
-------------------------------------------------------------------------------- 
    (      group and capture to \2: 
-------------------------------------------------------------------------------- 
    .+      any character except \n (1 or more times 
          (matching the most amount possible)) 
-------------------------------------------------------------------------------- 
)      end of \2 
-------------------------------------------------------------------------------- 
    \s*      whitespace (\n, \r, \t, \f, and " ") (0 or 
          more times (matching the most amount 
          possible)) 
-------------------------------------------------------------------------------- 
    $      before an optional \n, and the end of the 
          string

我相信你能弄清楚如何獲得第二個解析。

乾杯。

來源

2012-04-19 15:48:17

瞭解正則表達式

回答

相關問題