去掉佔位符內的xml標籤

我想使用sed（或其他工具）去掉xml標籤，但只能在特定位置標記「{{''}}」佔位符。例子：去掉佔位符內的xml標籤

<ok><ok2>{{TextShouldStay<not_ok>this_should_be_out</not_ok> 
<sthelse/>ThisShouldBeAgain}}</ok2></ok>

預期結果：

<ok><ok2>{{TextShouldStayThisShouldBeAgain}}</ok2></ok>

任何想法如何實現這一目標？

來源

2014-10-20 John Smith

做的{{}}塊中包含換行字符？你想要Perl的答案嗎？ – 2014-10-20 15:44:49

強制性鏈接：http://stackoverflow.com/a/1732454/7552 – 2014-10-20 16:29:24

@AvinashRaj：沒有換行符，Perl的回答也不錯！ – 2014-10-21 01:07:21

命令：

tr '\n' ' ' < file.xml | sed -r 's/(.*\{\{)([A-Za-z0-9]*)(<.*\/>)(.*)/\1\2\4\n/g'

輸出：

[email protected]:~/AMD$ cat file.xml 
<ok><ok2>{{TextShouldStay<not_ok>this_should_be_out</not_ok> 
<sthelse/>ThisShouldBeAgain}}</ok2></ok> 
[email protected]:~/AMD$ tr '\n' ' ' < file.xml | sed -r 's/(.*\{\{)([A-Za-z0-9]*)(<.*\/>)(.*)/\1\2\4\n/g' 
<ok><ok2>{{TextShouldStayThisShouldBeAgain}}</ok2></ok> 
[email protected]:~/AMD$ 


Here we remove the newlines first using 'tr' and then group the patterns using '(' and ')'. 
First group - from beginning of line to '{{' inclusive 
Second group - after '{{', whatever alphabets/numbers 
Third group - characters between the next '<' and last '/>' 
Fourth group - remaining characters. 

Once grouped, we remove the 3rd pattern group, also add newline.

來源

2014-10-21 06:59:36

感謝您的幫助！ – 2014-10-21 08:10:14

去掉佔位符內的xml標籤

回答

相關問題