Pyparsing：檢測具有特定結尾的令牌

我在想我在這裏做錯了什麼。也許有人可以給我提示這個問題。我想檢測使用以字符串_Init字符串結尾的pyparsing的某些令牌。Pyparsing：檢測具有特定結尾的令牌

舉個例子，我已經存儲在text

one 
two_Init 
threeInit 
four_foo_Init 
five_foo_bar_Init

以下行我要提取下面幾行：

two_Init 
four_foo_Init 
five_foo_bar_Init

目前，我已經減少了我的問題，以下面幾行：

import pyparsing as pp 

    ident = pp.Word(pp.alphas, pp.alphanums + "_") 
    ident_init = pp.Combine(ident + pp.Literal("_Init")) 

    for detected, s, e in ident_init.scanString(text): 
     print detected

使用此代碼沒有結果。如果我刪除Word語句中的"_"，那麼我至少可以檢測到其末尾有_Init的行。但結果並不完整：

['two_Init'] 
['foo_Init'] 
['bar_Init']

有人有任何想法我在做什麼完全錯誤在這裏？

來源

2013-04-29 daniel

問題是，只要它不是終止'_Init'中的'_'，您就想接受'_'。這裏有兩個pyparsing解決方案，一個是更「純」的pyparsing，另一個只是說它與它，並使用嵌入式正則表達式。

samples = """\ 
one 
two_Init 
threeInit 
four_foo_Init 
six_seven_Init_eight_Init 
five_foo_bar_Init""" 


from pyparsing import Combine, OneOrMore, Word, alphas, alphanums, Literal, WordEnd, Regex 

# implement explicit lookahead: allow '_' as part of your Combined OneOrMore, 
# as long as it is not followed by "Init" and the end of the word 
option1 = Combine(OneOrMore(Word(alphas,alphanums) | 
          '_' + ~(Literal("Init")+WordEnd())) 
        + "_Init") 

# sometimes regular expressions and their implicit lookahead/backtracking do 
# make things easier 
option2 = Regex(r'\b[a-zA-Z_][a-zA-Z0-9_]*_Init\b') 

for expr in (option1, option2): 
    print '\n'.join(t[0] for t in expr.searchString(samples)) 
    print

兩個選項打印：

two_Init 
four_foo_Init 
six_seven_Init_eight_Init 
five_foo_bar_Init

來源

2013-04-30 04:32:35 PaulMcG

Pyparsing：檢測具有特定結尾的令牌

回答

相關問題