3
我想提取有關從幾篇文章中受傷的人的信息。問題在於以新聞語言傳達這些信息的方式不同,因爲它可以用數字或文字書寫。正則表達式結合列表中的數字寫成字
例如:
`Security forces had *wounded two* gunmen inside the museum but that two or three accomplices might still be at large.`
`The suicide bomber has wounded *four men* last night.`
`*Dozens* were wounded in a terrorist attack.`
我注意到,因爲大部分時間數字,1-10去的都寫在單詞而不是數字。我想知道如何提取它們而不會產生任何令人費解的代碼,只需從1-10的單詞列出正則表達式即可。
我應該使用一個列表嗎?它將如何包括在內?
這是我迄今爲止用於提取人與數字受傷人數的模式:
text_open = open("News")
text_read = text_open.read()
pattern= ("wounded (\d+)|(\d+) were wounded|(\d+) injured|(\d+) people were wounded|wounding (\d+)|wounding at least (\d+)")
result = re.findall(pattern,text_read)
print(result)