2016-01-10 27 views
1

我有這樣的東西維基百科文章內容:如何從字符串中刪除所有Wiki模板?

{{Use mdy dates|date=June 2014}} 
{{Infobox person 
| name  = Richard Matthew Stallman 
| image  = Richard Stallman - Fête de l'Humanité 2014 - 010.jpg 
| caption  = Richard Stallman, 2014 
| birth_date = {{Birth date and age|1953|03|16}} 
| birth_place = New York City 
| nationality = American 
| other_names = RMS, rms 
| known_for = Free software movement, GNU, Emacs, GNU Compiler Collection|GCC 
| alma_mater = Harvard University,<br />Massachusetts Institute of Technology 
| occupation = President of the Free Software Foundation 
| website  = {{URL|https://www.stallman.org/}} 
| awards  = MacArthur Fellowship<br />EFF Pioneer Award<br />''... see #Honors and awards|Honors and awards'' 
}} 

{{Citation needed|date=May 2011}} 

如何去除呢?我可以使用這個正則表達式:/\{\{[^}]+\}\}/g,但它不會工作嵌套模板像Infobox

我試圖使用此代碼首先刪除嵌套的模板,然後刪除信息框,但我得到了錯誤的結果。

var input = document.getElementById('input'); 
 
input.innerHTML = input.innerHTML.replace(/\{\{[^}]+\}\}/g, '');
<pre id="input"> {{Use mdy dates|date=June 2014}} 
 
    {{Infobox person 
 
    | name  = Richard Matthew Stallman 
 
    | image  =Richard Stallman - Fête de l'Humanité 2014 - 010.jpg 
 
    | caption  = Richard Stallman, 2014 
 
    | birth_date = {{Birth date and age|1953|03|16}} 
 
    | birth_place = New York City 
 
    | nationality = American 
 
    | other_names = RMS, rms 
 
    | known_for = Free software movement, GNU, Emacs, GNU Compiler Collection|GCC 
 
    | alma_mater = Harvard University,<br />Massachusetts Institute of Technology 
 
    | occupation = President of the Free Software Foundation 
 
    | website  = {{URL|https://www.stallman.org/}} 
 
    | awards  = MacArthur Fellowship<br />EFF Pioneer Award<br />''... see #Honors and awards|Honors and awards'' 
 
    }}</pre>

+1

@yurzui這不會對文本的工作,包含{{}}在一個以上的地方https://regex101.com/r/kG7bO0/2 – jcubic

+0

@jcubic你的意思是'foo'不應該匹配? – tchelidze

+0

如果你可以在兩個步驟中完成,你可以匹配內部第一個然後外部,這對於內部https://regex101.com/r/pG5sS0/1 –

回答

3

的Javascript正則表達式不具備的功能(如遞歸或平衡組)來匹配嵌套的括號內。用正則表達式的一種方式包括處理字符串的模式數倍發現最裏面的支架,直到有什麼可以替代:

do { 
    var cnt=0; 
    txt = txt.replace(/{{[^{}]*(?:{(?!{)[^{}]*|}(?!})[^{}]*)*}}/g, function (_) { 
     cnt++; return ''; 
    }); 
} while (cnt); 

圖案的詳細資料:

{{ 
[^{}]* # all that is not a bracket 
(?: # this group is only useful if you need to allow single brackets 
    {(?!{)[^{}]* # an opening bracket not followed by an other opening bracket 
    | # OR 
    }(?!})[^{}]* # same thing for closing brackets 
)* 
}} 

如果你不想處理該字符串多次,您還可以逐字符地讀取字符串增加和減少括號時發現一個標誌。

採用分體式和Array.prototype.reduce的另一種方式:

var stk = 0; 
var result = txt.split(/({{|}})/).reduce(function(c, v) { 
    if (v == '{{') { stk++; return c; } 
    if (v == '}}') { stk = stk ? stk-1 : 0; return c; } 
    return stk ? c : c + v; 
}); 
相關問題