0
re_newspeaker = r'^(<bullet> | )(?P<name>(%s|(((Mr)|(Ms)|(Mrs))\. [-A-Za-z \']+(of [A-Z][a-z]+)?))|((The ((VICE|ACTING|Acting))?(PRESIDENT|SPEAKER|CHAIR(MAN)?)(pro tempore)?)|(The PRESIDING OFFICER)|(The CLERK)|(The CHIEF JUSTICE)|(The VICE PRESIDENT)|(Mr\. Counsel [A-Z]+))(\([A-Za-z.\'\- ]+\))?)\.'
re_speaking = r'^(<bullet> | )((((((Mr)|(Ms)|(Mrs))\. [A-Za-z \'\-]+(of [A-Z][a-z]+)?)|((The (VICE |Acting |ACTING)?(PRESIDENT|SPEAKER)(pro tempore)?)|(The PRESIDING OFFICER)|(The CLERK))(\([A-Za-z.\'\- ]+\))?))\.)?(?P<start>.)'
由於某種原因,上述正則表達式沒有捕獲帶撇號的名稱。Python正則表達式匹配撇號
例如:D'STALL先生 未匹配。任何與正則表達式模式的幫助將是最受讚賞。
代碼的作用是獲取輸入並將其標記爲XML。如下所示:
<speaker=Mr. D'STALL</speaker><speaking>Mr. President, I have been seeking to obtain a report on
this bill. I am not on the Budget Committee, and I am not on the
Government Relations Committee. But from what I understand, this is a
very important bill, a big bill, a complex bill, far reaching in its
contents. I have been queried, along with all other Senators, I
suppose, as to whether or not they would have any objection to the
adoption of the committee amendments, en bloc. I am going to object to
the adoption of the committee amendments, en bloc, until I see the
committee report.</speaking>
Mr. D'STALL. Mr. President, I have been seeking to obtain a report on
this bill. I am not on the Budget Committee, and I am not on the
Government Relations Committee. But from what I understand, this is a
very important bill, a big bill, a complex bill, far reaching in its
contents. I have been queried, along with all other Senators, I
suppose, as to whether or not they would have any objection to the
adoption of the committee amendments, en bloc. I am going to object to
the adoption of the committee amendments, en bloc, until I see the
committee report.
該正則表達式不符合上述段落。
這是多麼可怕的不可維護的模式,你去那裏。我認爲這個問題會影響兩種模式? – 2014-09-23 08:47:42
http://regex101.com/r/dT6dN8/1 – 2014-09-23 08:47:58
你的正則表達式需要在開始時有一個'space'或'bullet',它是否在你的輸入中? – vks 2014-09-23 08:49:51