正則表達式問題：直到下一場比賽或文檔結束

我正在從文檔分析器中提取一些文檔中的數據，這些文檔是我在C＃中編寫的。文檔處於以下形式：正則表達式問題：直到下一場比賽或文檔結束


(Type 1): (potentially multi-lined string) 
(Type 2): (potentially multi-lined string) 
(Type 3): (potentially multi-lined string) 
... 
(Type N): (potentially multi-lined string) 
(Type 1): (potentially multi-lined string) 
... 
End Of Document.

文檔重複（類型1） - （類型N）M倍以相同的格式

我在與所述多內襯字符串的麻煩和發現的（類型1）最後一次迭代 - （N型）

我需要做的就是捕捉（可能多行字符串）一個由它的前述命名（類型＃）

組

下面是該文件的一個片段，我想匹配：

 
Name: John Dow 
Position: VP. over Development 
Bio: Here is a really long string of un important stuff 
that could include words like "Bio" or "Name". Some times I have problems 
here, but for the most part it should be normal Bio information 
Position History: Vp. over Development 
Sr. Project Manager 
Jr. Project Manager 
Developer 
Peon 
Notes: Here are some notes that may or may not be multilined 
and if it is, all the lines need to be captured for this person. 
Name: Joe Noob 
Position: Peon 
Bio: I'm a peon, so I have little bio 
Position History: Peon 
Notes: few notes 
Name: Jane Smith 
Position: VP. over Sales 
Bio: Here is a really long string of more un important stuff 
that could include words like "Bio" or "Name". Some times I have problems 
here, but for the most part it should be normal Bio information 
Position History: Vp. over Sales 
Sales Manager 
Secretary 
Notes: Here are some notes that may or may not be multilined 
and if it is, all the lines need to be captured for this person.

（型號＃）的順序總是相同的，他們總是以換行符preceeded。

我有什麼：

 
Name:\s(?:(?.*?)\r\n)+?Position:\s(?:(?.*?)\r\n)+?Bio:\s(?:(?.*?)\r\n)+?Position History:\s(?:(?.*?)\r\n)+?Notes:\s(?:(?.*?)\r\n)+?

任何幫助將是巨大的！

來源

2011-01-25 joe_coolish

最簡單修復將以從右到左模式進行匹配：

Regex r = new Regex(@"Name:\s(?:(.*?)\r\n)+?" + 
        @"Position:\s(?:(.*?)\r\n)+?" + 
        @"Bio:\s(?:(.*?)\r\n)+?" + 
        @"Position History:\s(?:(.*?)\r\n)+?" + 
        @"Notes:\s(?:(.*?)\r\n)+?", 
        RegexOptions.Singleline | RegexOptions.RightToLeft);

由wa Ÿ，我必須刪除一堆不恰當的問號才能使其工作。你確實希望這些團體能夠捕捉到，不是嗎？

來源

2011-01-25 17:29:59

試試這個：

(?'tag'[\w\s]+):\s*(?'val'.*([\r\n][^:]*)*)

我只是gruped爲命名組「標籤」標籤前的「：」和值的（潛在的）多行文本。

來源

2011-01-25 16:54:57

要添加，您需要重構代碼以處理不同的標記值。 – 2011-01-25 16:56:56

因爲您使用的是惰性匹配，所以最後一個標記只需要儘可能多的。

(?=^Name:|$)

下面是完整的正則表達式：您可以通過添加在你的模式結束，lookahed匹配，直到下一個標記解決

Name:\s(?:(.*?)\s+)Position:\s(?:(.*?)\s+)Bio:\s(?:(.*?)\s+)Position History:\s(?:(.*?)\s+)Notes:\s(?:(.*?)\s+)(?=^Name:|$)

例子：http://regexhero.net/tester/?id=92982feb-806f-4d0a-96a3-5ef6689a0e01

來源

2011-01-25 17:08:01 Kobi

這就是我在找的:)這裏是最後的工作產品：名稱：\ s（？（？：。*？$ \ s？）+？）位置：\ s（？（？：。*位置歷史：\ s（？（？：。*？$ s？？？？？？？？？）？？？？？？？？？）+？）注意：\ s（？（？：。*？$ \ s？）+？）（？=^Name：| $） – 2011-01-25 17:28:57

正則表達式問題：直到下一場比賽或文檔結束

回答

相關問題