2016-12-29 58 views
-3

我具有形式(關鍵字接着左括號,接着任意字符串接着通過連字符分隔2周的日期)的句子:Python的正則表達式來提取日期圖案

Mohandas Karamchand Gandhi (/ˈɡɑːndi, ˈɡæn-/; Hindustani: [ˈmoːɦənd̪aːs ˈkərəmtʃənd̪ ˈɡaːnd̪ʱi]; 2 October 1869 – 30 January 1948) was the preeminent leader of the Indian independence movement in British-ruled India. 

我需要提取出生日期(1869年10月2日)和死亡日期(1948年1月30日)使用正則表達式從這個句子中抽取出來。我已經寫出了提取日期模式的正則表達式。

date_pattern="(\d{1,2}(\s|-|/)?(Jan(uary)?|Feb(ruary)?|Mar(ch)?|Apr(il)?|May?|June?|July?|Aug(ust)?|Sep(t(ember)?)?|Oct(ober)?|Nov(ember)?|Dec(ember)?|\d{1,2})(\s|-|/)?\d{2,4})" 

我需要提取上述表格的句子,並分別打印出生日期和死亡日期。

+0

這是型的問題是「我調試我的代碼」,或者是有,你掙扎問題的特定部分?日期會採用多種不同的格式還是隻有特定的格式?你需要擔心大小寫嗎? –

+0

你在哪裏卡住了,哪裏沒有[先前的答案](http://stackoverflow.com/search?q= [python] +正則表達式+日期)解決你的問題? – Prune

+0

如果輸入格式與問題中的示例一樣規則,則根本不需要正則表達式。每條信息由圓括號和分號分隔,因此您可以使用'str.split'來獲取零件,然後使用'datetime.strptime'來分析日期。 – ekhumoro

回答

0
import re 
date_pattern="(\d{1,2}(?:\s|-|/)?(?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May?|June?|July?|Aug(?:ust)?|Sep(?:t(?:ember)?)?|Oct(?:ober)?|Nov(?:ember)?|Dec(?:ember)?|\d{1,2})(?:\s|-|/)?\d{2,4})" 

bio = "Mohandas Karamchand Gandhi (/ˈɡɑːndi, ˈɡæn-/; Hindustani: [ˈmoːɦənd̪aːs ˈkərəmtʃənd̪ ˈɡaːnd̪ʱi]; 2 October 1869 – 30 January 1948) was the preeminent leader of the Indian independence movement in British-ruled India." 

matches = re.findall(date_pattern, bio) 
if matches and len(matches) > 1: 
    born = matches[0] 
    died = matches[1] 
    print("Born:", born) 
    print("Died:", died) 
0
import re 

text = '''Mohandas Karamchand Gandhi (/ˈɡɑːndi, ˈɡæn-/; Hindustani: [ˈmoːɦənd̪aːs ˈkərəmtʃənd̪ ˈɡaːnd̪ʱi]; 2 October 1869 – 30 January 1948) was the preeminent leader of the Indian independence movement in British-ruled India.''' 
birth, death = re.findall(r'\d+[ \d\w]+', text) 
print(birth) 
print(death) 

出來:

2 October 1869 
30 January 1948 
+0

嗨,關鍵字後跟左括號,後跟任何字符串後跟2個日期用連字符分隔 – Anu