2013-04-29 70 views
3

我在想我在這裏做錯了什麼。也許有人可以給我提示這個問題。 我想檢測使用以字符串_Init字符串結尾的pyparsing的某些令牌。Pyparsing:檢測具有特定結尾的令牌

舉個例子,我已經存儲在text

one 
two_Init 
threeInit 
four_foo_Init 
five_foo_bar_Init 

以下行我要提取下面幾行:

two_Init 
four_foo_Init 
five_foo_bar_Init 

目前,我已經減少了我的問題,以下面幾行:

import pyparsing as pp 

    ident = pp.Word(pp.alphas, pp.alphanums + "_") 
    ident_init = pp.Combine(ident + pp.Literal("_Init")) 

    for detected, s, e in ident_init.scanString(text): 
     print detected 

使用此代碼沒有結果。如果我刪除Word語句中的"_",那麼我至少可以檢測到其末尾有_Init的行。但結果並不完整:

['two_Init'] 
['foo_Init'] 
['bar_Init'] 

有人有任何想法我在做什麼完全錯誤在這裏?

回答

2

問題是,只要它不是終止'_Init'中的'_',您就想接受'_'。這裏有兩個pyparsing解決方案,一個是更「純」的pyparsing,另一個只是說它與它,並使用嵌入式正則表達式。

samples = """\ 
one 
two_Init 
threeInit 
four_foo_Init 
six_seven_Init_eight_Init 
five_foo_bar_Init""" 


from pyparsing import Combine, OneOrMore, Word, alphas, alphanums, Literal, WordEnd, Regex 

# implement explicit lookahead: allow '_' as part of your Combined OneOrMore, 
# as long as it is not followed by "Init" and the end of the word 
option1 = Combine(OneOrMore(Word(alphas,alphanums) | 
          '_' + ~(Literal("Init")+WordEnd())) 
        + "_Init") 

# sometimes regular expressions and their implicit lookahead/backtracking do 
# make things easier 
option2 = Regex(r'\b[a-zA-Z_][a-zA-Z0-9_]*_Init\b') 

for expr in (option1, option2): 
    print '\n'.join(t[0] for t in expr.searchString(samples)) 
    print 

兩個選項打印:

two_Init 
four_foo_Init 
six_seven_Init_eight_Init 
five_foo_bar_Init