2017-06-21 66 views
1

我有串像下面查找字符串模式使用正則表達式與Python 3

字符串= 「您的收據號IVR/20170531/XVII/V/12652967和IVR/20170531/XVII/V/13652967」

我想發票號碼IVR/20170531/XVII/V/12652967和IVR/20170531/XVII/V/13652967到列表使用正則表達式使用此模式

 result = re.findall(r'INV[/]\d{8}[/](M{1,4}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})|M{0,4}(CM|C?D|D?C{1,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})|M{0,4}(CM|CD|D?C{0,3})(XC|X?L|L?X{1,3})(IX|IV|V?I{0,3})|M{0,4}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|I?V|V?I{1,3}))[/](M{1,4}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})|M{0,4}(CM|C?D|D?C{1,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})|M{0,4}(CM|CD|D?C{0,3})(XC|X?L|L?X{1,3})(IX|IV|V?I{0,3})|M{0,4}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|I?V|V?I{1,3}))[/]\d{7,9}',string) 

但結果是

[('XVII', '', '','', '', '', '', '', 'X', 'VII', '', '', '', 'V','','','', '', '', '', '', '', '', '', '', 'V')] 

http://regexr.com/嘗試這種模式,結果是適當的,但在python不

+0

哪部分是你給的樣本串進行發票號碼? –

+0

'IVR [^ \ s] +'對嗎? – depperm

+0

我想你需要在你的模式中增加一些'|'... https:// regex101。com/r/OZNSem/1 –

回答

0

你應該修改你的格局,周圍添加整體的正則表達式方括號,並隨後訪問與第一反向參考文本。您可以閱讀更多關於後退參考here

invoices = [] 
# Your pattern was slightly incorrect 
pattern = re.compile(r'IVR[/]\d{8}[/](M{1,4}(CM|CD|D?C{0,3})|(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})|M{0,4}(CM|C?D|D?C{1,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})|M{0,4}(CM|CD|D?C{0,3})(XC|X?L|L?X{1,3})(IX|IV|V?I{0,3})|M{0,4}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|I?V|V?I{1,3}))[/](M{1,4}(CM|CD|D?C{0,3})|(XC|XL|L?X{0,3})|(IX|IV|V?I{0,3})|M{0,4}(CM|C?D|D?C{1,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})|M{0,4}(CM|CD|D?C{0,3})(XC|X?L|L?X{1,3})(IX|IV|V?I{0,3})|M{0,4}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|I?V|V?I{1,3}))[/]\d{7,9}') 

# For each invoice pattern you find in string, append it to list 
for invoice in pattern.finditer(string): 
    invoices.append(invoice.group(1)) 

注:

您還應該使用pattern.finditter(),因爲這樣你可以遍歷低谷文本你叫string所有模式的結果。從re.finditer文檔:

re.finditer(圖案,字符串標誌= 0) 返回一個迭代產生超過在串中的RE 圖案中的所有非重疊匹配 MatchObject實例。該字符串從左到右掃描,匹配結果 以找到的順序返回。除非他們觸及另一場比賽的開始,否則空的比賽包含在 結果中。

0
string = "your invoice number IVR/20170531/XVII/V/12652967 and IVR/20170531/XVII/V/13652967" 
results = [] 
matches = re.finditer(regexpattern, string) 
for matchNum, match in enumerate(matches): 
    results.append(match.group()) 
0

你需要的所有組前加?:,使您可以使用non-capturing groups

與嘗試這個正則表達式:

IVR[/]\d{8}[/](?:M{0,4}(?:CM|CD|D?C{0,3})|(?:XC|XL|L?X{0,3})(?:IX|IV|V?I{0,3}))[/](?:M{0,4}(?:CM|CD|D?C{0,3})|(?:XC|XL|L?X{0,3})(?:IX|IV|V?I{0,3}))[/]\d{8} 

基本上你需要添加?:爲每個組。

0

你可以試試這個檢索號碼,羅馬,羅馬和數值:

IVR\/(\d{8})\/(M{0,4}(?:CM|CD|D?C{0,3})(?:XC|XL|L?X{0,3})(?:IX|IV|V?I{0,3}))\/(M{0,4}(?:CM|CD|D?C{0,3})(?:XC|XL|L?X{0,3})(?:IX|IV|V?I{0,3}))\/(\d{7,9})

Demo

片段

import re 

string = "your invoice number IVR/20170531/XVII/V/12652967 and IVR/20170531/XVII/V/13652967" 

pattern = r"IVR\/(\d{8})\/(M{0,4}(?:CM|CD|D?C{0,3})(?:XC|XL|L?X{0,3})(?:IX|IV|V?I{0,3}))\/(M{0,4}(?:CM|CD|D?C{0,3})(?:XC|XL|L?X{0,3})(?:IX|IV|V?I{0,3}))\/(\d{7,9})" 

for match in re.findall(pattern, string): 
    print(match) 

Run online

相關問題