2011-01-25 65 views
1

我正在從文檔分析器中提取一些文檔中的數據,這些文檔是我在C#中編寫的。文檔處於以下形式:正則表達式問題:直到下一場比賽或文檔結束


(Type 1): (potentially multi-lined string) 
(Type 2): (potentially multi-lined string) 
(Type 3): (potentially multi-lined string) 
... 
(Type N): (potentially multi-lined string) 
(Type 1): (potentially multi-lined string) 
... 
End Of Document. 

文檔重複(類型1) - (類型N)M倍以相同的格式

我在與所述多內襯字符串的麻煩和發現的(類型1)最後一次迭代 - (N型)

我需要做的就是捕捉(可能多行字符串)一個由它的前述命名(類型#)

下面是該文件的一個片段,我想匹配:

 
Name: John Dow 
Position: VP. over Development 
Bio: Here is a really long string of un important stuff 
that could include words like "Bio" or "Name". Some times I have problems 
here, but for the most part it should be normal Bio information 
Position History: Vp. over Development 
Sr. Project Manager 
Jr. Project Manager 
Developer 
Peon 
Notes: Here are some notes that may or may not be multilined 
and if it is, all the lines need to be captured for this person. 
Name: Joe Noob 
Position: Peon 
Bio: I'm a peon, so I have little bio 
Position History: Peon 
Notes: few notes 
Name: Jane Smith 
Position: VP. over Sales 
Bio: Here is a really long string of more un important stuff 
that could include words like "Bio" or "Name". Some times I have problems 
here, but for the most part it should be normal Bio information 
Position History: Vp. over Sales 
Sales Manager 
Secretary 
Notes: Here are some notes that may or may not be multilined 
and if it is, all the lines need to be captured for this person. 



(型號#)的順序總是相同的,他們總是以換行符preceeded。

我有什麼:

 
Name:\s(?:(?.*?)\r\n)+?Position:\s(?:(?.*?)\r\n)+?Bio:\s(?:(?.*?)\r\n)+?Position History:\s(?:(?.*?)\r\n)+?Notes:\s(?:(?.*?)\r\n)+? 



任何幫助將是巨大的!

回答

2

最簡單修復將以從右到左模式進行匹配:

Regex r = new Regex(@"Name:\s(?:(.*?)\r\n)+?" + 
        @"Position:\s(?:(.*?)\r\n)+?" + 
        @"Bio:\s(?:(.*?)\r\n)+?" + 
        @"Position History:\s(?:(.*?)\r\n)+?" + 
        @"Notes:\s(?:(.*?)\r\n)+?", 
        RegexOptions.Singleline | RegexOptions.RightToLeft); 

由wa Ÿ,我必須刪除一堆不恰當的問號才能使其工作。你確實希望這些團體能夠捕捉到,不是嗎?

2

試試這個:

(?'tag'[\w\s]+):\s*(?'val'.*([\r\n][^:]*)*) 

我只是gruped爲命名組「標籤」標籤前的「:」和值的(潛在的)多行文本。

+1

要添加,您需要重構代碼以處理不同的標記值。 – 2011-01-25 16:56:56

3

因爲您使用的是惰性匹配,所以最後一個標記只需要儘可能多的。

(?=^Name:|$) 

下面是完整的正則表達式:您可以通過添加在你的模式結束,lookahed匹配,直到下一個標記解決

Name:\s(?:(.*?)\s+)Position:\s(?:(.*?)\s+)Bio:\s(?:(.*?)\s+)Position History:\s(?:(.*?)\s+)Notes:\s(?:(.*?)\s+)(?=^Name:|$) 

例子:http://regexhero.net/tester/?id=92982feb-806f-4d0a-96a3-5ef6689a0e01

+0

這就是我在找的:)這裏是最後的工作產品:名稱:\ s(?(?:。*?$ \ s?)+?)位置:\ s(?(?:。*位置歷史:\ s(?(?:。*?$ s?????????)????????? )+?)注意:\ s(?(?:。*?$ \ s?)+?)(?=^Name:| $) – 2011-01-25 17:28:57

相關問題