2011-09-28 81 views
2

好了,還有很多的正則表達式,但一如既往,他們都不匹配我想要做的事。多行正則表達式替換

我有一個文本文件:

F00220034277909272011         
H001500020003000009272011        
D001500031034970000400500020000000025000000515000000000 
D001500001261770008003200010000000025000000132500000000 
H004200020001014209272011        
D004200005355800007702200005142000013420000000000000000 
D004200031137360000779000005000000012000000000000000000 
H050100180030263709272011        
D050100001876700006000300019500000025000000250000001500 
D050100001247060000071500030000000025000000280000000000 
D050100002075670000430400020000000025000000515000000000 
D050100008342500007702600005700000010000000000000000700 
D050100009460270000702100015205000025000000000000006205 
D050100008135120000702400015000000010000000000000001000 
D050100006938430000702200026700000010000000000000001000 
D050100006423710008000200025700000000000000000000001000 
D050100009488040008000600007175000000000000000000001000 
D050100001299190000800100016300000000000000000000003950 
D050100001244850000800400005407000000000000000000001607 
D050100001216280000840200020000000000000001000000006200 
D050100001216840000479000008175000000000000100000001000 
D050100001265880000410200014350000000000000100000001000 
D050100007402650002000300026700000000000000100000001000 
D050100001305150002000200016175000000000001000000000000 
D050100005435430000899700022350000000000001000000000000 
D050100031113850000500200008200000000250000100000001000 

,並用正則表達式多(.NET味的),我想要做一個替換,這樣我得到:

H050100180030263709272011        
D050100001876700006000300019500000025000000250000001500 
D050100001247060000071500030000000025000000280000000000 
D050100002075670000430400020000000025000000515000000000 
D050100008342500007702600005700000010000000000000000700 
D050100009460270000702100015205000025000000000000006205 
D050100008135120000702400015000000010000000000000001000 
D050100006938430000702200026700000010000000000000001000 
D050100006423710008000200025700000000000000000000001000 
D050100009488040008000600007175000000000000000000001000 
D050100001299190000800100016300000000000000000000003950 
D050100001244850000800400005407000000000000000000001607 
D050100001216280000840200020000000000000001000000006200 
D050100001216840000479000008175000000000000100000001000 
D050100001265880000410200014350000000000000100000001000 
D050100007402650002000300026700000000000000100000001000 
D050100001305150002000200016175000000000001000000000000 
D050100005435430000899700022350000000000001000000000000 
D050100031113850000500200008200000000250000100000001000 

,這樣,基本上,我抓住了所有以[HD]0501開頭的東西,沒有別的。

我知道這似乎更適合比賽,一個代替,但我會通過接受一個正則表達式模式串預先建立的引擎和正則表達式只替換字符串。

我能提供什麼模式和替換字符串,讓我想要的結果?多線正則表達式是一種硬編碼配置?

我本來以爲像這樣的工作:

搜索: (?<Match>^[HD]0501\d+$),但這種匹配什麼。

搜索: (?!^[HD]0501\d+$),但這相匹配的一堆空字符串,我想不出該怎麼把用於替換字符串。

搜索: (?!(?<Omit>^[HD]0501\d+$))「Group'Omit'not found」。

看來這應該是簡單的,但一如既往,正則表達式設法讓我覺得愚蠢。幫助將不勝感激。

回答

3

嘗試匹配以下模式:

(?m)^(?![HD]0501).+(\r?\n)? 

,並用一個空字符串替換它。

以下演示:

using System; 
using System.Text.RegularExpressions; 

namespace Test 
{ 
    class MainClass 
    { 
    public static void Main (string[] args) 
    { 
     string input = @"F00220034277909272011         
H001500020003000009272011        
D001500031034970000400500020000000025000000515000000000 
D001500001261770008003200010000000025000000132500000000 
H004200020001014209272011        
D004200005355800007702200005142000013420000000000000000 
D004200031137360000779000005000000012000000000000000000 
H050100180030263709272011        
D050100001876700006000300019500000025000000250000001500 
D050100001247060000071500030000000025000000280000000000 
D050100002075670000430400020000000025000000515000000000 
D050100008342500007702600005700000010000000000000000700 
D050100009460270000702100015205000025000000000000006205 
D050100008135120000702400015000000010000000000000001000 
D050100006938430000702200026700000010000000000000001000 
D050100006423710008000200025700000000000000000000001000 
D050100009488040008000600007175000000000000000000001000 
D050100001299190000800100016300000000000000000000003950 
D050100001244850000800400005407000000000000000000001607 
D050100001216280000840200020000000000000001000000006200 
D050100001216840000479000008175000000000000100000001000 
D050100001265880000410200014350000000000000100000001000 
D050100007402650002000300026700000000000000100000001000 
D050100001305150002000200016175000000000001000000000000 
D050100005435430000899700022350000000000001000000000000 
D050100031113850000500200008200000000250000100000001000"; 

     string regex = @"(?m)^(?![HD]0501).+(\r?\n)?"; 

     Console.WriteLine(Regex.Replace(input, regex, "")); 
    } 
    } 
} 

打印:

H050100180030263709272011        
D050100001876700006000300019500000025000000250000001500 
D050100001247060000071500030000000025000000280000000000 
D050100002075670000430400020000000025000000515000000000 
D050100008342500007702600005700000010000000000000000700 
D050100009460270000702100015205000025000000000000006205 
D050100008135120000702400015000000010000000000000001000 
D050100006938430000702200026700000010000000000000001000 
D050100006423710008000200025700000000000000000000001000 
D050100009488040008000600007175000000000000000000001000 
D050100001299190000800100016300000000000000000000003950 
D050100001244850000800400005407000000000000000000001607 
D050100001216280000840200020000000000000001000000006200 
D050100001216840000479000008175000000000000100000001000 
D050100001265880000410200014350000000000000100000001000 
D050100007402650002000300026700000000000000100000001000 
D050100001305150002000200016175000000000001000000000000 
D050100005435430000899700022350000000000001000000000000 
D050100031113850000500200008200000000250000100000001000 

一個快速的解釋:

  • (?m)
    • 啓用多行模式,使^匹配新行的開始;
  • ^
    • 匹配一個新行的開始;
  • (?![HD]0501)
    • 向前看,看看有沒有"H0501""D0501";
  • .+
    • 匹配一個或多於換行符-字符其他更多字符;
  • (\r?\n)?
    • 匹配的可選換行符。
+0

這做到了。謝謝!我必須弄清楚它是如何做到的,但是這讓我走了。 –

+0

不客氣@Jeremy,我還添加了對這種模式的簡要說明。 –