2014-10-20 87 views
0

我想使用sed(或其他工具)去掉xml標籤,但只能在特定位置標記「{{''}}」佔位符。 例子:去掉佔位符內的xml標籤

<ok><ok2>{{TextShouldStay<not_ok>this_should_be_out</not_ok> 
<sthelse/>ThisShouldBeAgain}}</ok2></ok> 

預期結果:

<ok><ok2>{{TextShouldStayThisShouldBeAgain}}</ok2></ok> 

任何想法如何實現這一目標?

+0

做的{{}}塊中包含換行字符?你想要Perl的答案嗎? – 2014-10-20 15:44:49

+1

強制性鏈接:http://stackoverflow.com/a/1732454/7552 – 2014-10-20 16:29:24

+0

@AvinashRaj:沒有換行符,Perl的回答也不錯! – 2014-10-21 01:07:21

回答

1

命令:

tr '\n' ' ' < file.xml | sed -r 's/(.*\{\{)([A-Za-z0-9]*)(<.*\/>)(.*)/\1\2\4\n/g' 

輸出:

[email protected]:~/AMD$ cat file.xml 
<ok><ok2>{{TextShouldStay<not_ok>this_should_be_out</not_ok> 
<sthelse/>ThisShouldBeAgain}}</ok2></ok> 
[email protected]:~/AMD$ tr '\n' ' ' < file.xml | sed -r 's/(.*\{\{)([A-Za-z0-9]*)(<.*\/>)(.*)/\1\2\4\n/g' 
<ok><ok2>{{TextShouldStayThisShouldBeAgain}}</ok2></ok> 
[email protected]:~/AMD$ 


Here we remove the newlines first using 'tr' and then group the patterns using '(' and ')'. 
First group - from beginning of line to '{{' inclusive 
Second group - after '{{', whatever alphabets/numbers 
Third group - characters between the next '<' and last '/>' 
Fourth group - remaining characters. 

Once grouped, we remove the 3rd pattern group, also add newline. 
+0

感謝您的幫助! – 2014-10-21 08:10:14