正則表達式結合列表中的數字寫成字

我想提取有關從幾篇文章中受傷的人的信息。問題在於以新聞語言傳達這些信息的方式不同，因爲它可以用數字或文字書寫。正則表達式結合列表中的數字寫成字

例如：

`Security forces had *wounded two* gunmen inside the museum but that two or three accomplices might still be at large.` 

`The suicide bomber has wounded *four men* last night.` 

`*Dozens* were wounded in a terrorist attack.`

我注意到，因爲大部分時間數字，1-10去的都寫在單詞而不是數字。我想知道如何提取它們而不會產生任何令人費解的代碼，只需從1-10的單詞列出正則表達式即可。

我應該使用一個列表嗎？它將如何包括在內？

這是我迄今爲止用於提取人與數字受傷人數的模式：

text_open = open("News") 
text_read = text_open.read() 
pattern= ("wounded (\d+)|(\d+) were wounded|(\d+) injured|(\d+) people were wounded|wounding (\d+)|wounding at least (\d+)") 
result = re.findall(pattern,text_read) 
print(result)

來源

2016-12-02 M.Huntz

試試這個

import re 

regex = r"(\w)+\s(?=were)|(?<=wounded|injured)\s[\w]{3,}" 

test_str = ("`Security forces had wounded two gunmen inside the museum but that two or three accomplices might still be at large.`\n\n" 
    "`The suicide bomber has wounded four men last night.`\n\n" 
    "`Dozens were wounded in a terrorist attack.") 

matches = re.finditer(regex, test_str) 

for match in matches:  
    print (match.group().strip())

輸出：

two 
four 
Dozens

\w+\s(?=were)：?=展望未來were，找到捕獲字使用\w

|或

(?<=wounded|injured)\s\w{3,}：?<=如果受傷或受傷的字前發生和{3,}平均字的長度爲3個或更多，只是爲了避免拍攝字即in，每個數字字有分鐘向後看，捕捉字長度爲3，所以可以使用它。

來源

2016-12-02 18:28:06

正則表達式結合列表中的數字寫成字

回答

相關問題