2012-04-03 41 views
2

我之前沒有寫過正則表達式,而且我的知識遠遠不足。我希望這裏的專家能夠幫助我使用C#中的正則表達式來刪除標記標籤。移除自定義標記標記所需的正則表達式

的標記具有一個下面開口標籤:<AI>!<AH>!<AG>!,並與另一!

實施例結束:the quick brown <AI>!fox jumps! over the lazy dog!

標記除去應後:the quick brown fox jumps over the lazy dog!

代碼段:

NOT MORE THAN 85 % OF H<AH>!3!BO<AH>!3! CALCULATED ON THE DRY WEIGHT 
- Uranium ores and pitchblende, and concentrates thereof, with a uranium content of more than 5 % by weight (<AI>!Euratom!) 
- Monazite; urano-thorianite and other thorium ores and concentrates, with a thorium content of more than 20 % by weight (<AI>!Euratom!) 
- - - - -94% or more, but not more than 98.5% of a-Al<AH>!2!O<AH>!3! -2% (+/-1.5%) of magnesium spinel, -1% (+/-0.6%) of yttrium oxide and -2% (+/-1.2%) of each lanthanum oxide and neodymium oxide with less than 50% of the total weight having a particle size of more than 10mm 
- Activated alumina with a specific surface area of at least 350 m<AG>!2!g 
IRON OXIDES AND HYDROXIDES; EARTH COLOURS CONTAINING 70 % OR MORE BY WEIGHT OF COMBINED IRON EVALUATED AS FE<AH>!2!O<AH>!3!: 
- <AI>!o!-Xylene 
- <AI>!m!-Xylene 
- <AI>!p!-Xylene 
- - - 1,6,7,8,9,14,15,16,17,18,18-Dodecachloropentcyclo[12.2.1.1<AG>!6,9!.0<AG>!2,13!.0<AG>!5,10!]octadeca-7,15-diene, (CAS RN 13560-89-9) 
- Chlorobenzene, <AI>!o!-dichlorobenzene and <AI>!p!-dichlorobenzene 
- - - Di- or tetrachlorotricyclo[8.2.2.2<AG>!4,7!]xadeca-1(12),4,6,10,13,15-hexaene, mixed isomers 
- Butan-1-ol (<AI>!n!-butyl alcohol) 
- - 2-Methylpropan-2-ol (<AI>!tert!-butyl alcohol) 
- <AI>!n!-Butyl acetate 
- <AI>!O!-Acetylsalicylic acid, its salts and esters 
- - <AI>!O!-Acetylsalicylic acid (CAS RN 50-78-2) 
- 1-Naphthylamine (<AH>!alpha!-naphthylamine), 2-naphthylamine (<AI>!beta!-naphthylamine) and their derivatives; salts thereof 
- <AI>!o!-, <AG>!m!-, <AH>!p!-Phenylenediamine, diaminotoluenes, and their derivatives; salts thereof: 
- - <AI>!o!-, <AI>!m!-, <AI>!p!-Phenylenediamine, diaminotoluenes and their halogenated, sulphonated, nitrated and nitrosated derivatives; salts thereof: 
- - Indole, 3-methylindole (skatole), 6-allyl-6,7-dihydro-5<AI>!H!-dibenz[<AI>!c,e!] azepinne (azapetine), phenindamine (INN) and their salts; imipramine hydrochloride (INNM) 
- Vitamin B<AH>!1! and its derivatives 
- Vitamin B<AH>!2! and its derivatives 

感謝您提前

回答

5

使用將尋找<>一個隨後的一個[GHI]包圍的正則表達式!。它發現後,它會做一個懶搜索(由表示?)一個或多個(+)什麼(),然後是一個感嘆號的。這是懶惰的,所以它不會尋找,直到它找到樣本中的最後一個感嘆號,它會停在第一個感嘆號並替換它找到的。然後它將使用分組(格式中的括號)來存儲標籤中包含的值,並在替換時使用它($ 1表示第一組)。

var r = new Regex("<A[GHI]>!(.+?)!"); 
var actual = r.Replace(xml, "$1"); 
+3

打破正則表達式,並解釋到OP和其他人將是一個很好的事情。 – Oded 2012-04-03 14:52:45

+0

好呼!這足夠嗎? – 2012-04-03 14:57:48

+0

完美謝謝你 – Shurugwi 2012-04-03 15:21:28

0

使用必須是這樣的正則表達式:

\<..\>!([^!]*)! 

,因爲你必須< 兩個字母>匹配! 一系列的人物沒有!最後一個!再次。

然後,您通過捕獲的匹配(即括號之間的文本)替換匹配(與上述表達式匹配的整個文本)。

0
using System; 
using System.Text.RegularExpressions; 

public class Example 
{ 
    public static void Main() 
    { 
     string pattern = @"\<A(G|H|I)\>\!([^\!]*)\!"; 
     string input = "<AI>!n!-Butyl acetate the quick brown " 
      + "<AI>!fox jumps! over the lazy dog!"; 
     string replacement = "$2"; 
     Regex rgx = new Regex(pattern); 
     string result = rgx.Replace(input, replacement); 

     Console.WriteLine("Original String: '{0}'", input); 
     Console.WriteLine("Replacement String: '{0}'", result);        
    } 
} 

Original String: '<AI>!n!-Butyl acetate the quick brown <AI>!fox jumps! over the lazy dog!' 
Replacement String: 'n-Butyl acetate the quick brown fox jumps over the lazy dog!' 

http://ideone.com/z0fbL