正則表達式，匹配特定XML標籤的內容，但沒有標籤本身

我一整天都在抨擊這個正則表達式。正則表達式，匹配特定XML標籤的內容，但沒有標籤本身

該任務看起來很簡單，我有一些XML標籤名稱，我必須替換（掩碼）其內容。

例如

<Exony_Credit_Card_ID>242394798</Exony_Credit_Card_ID>

必須成爲

<Exony_Credit_Card_ID>filtered</Exony_Credit_Card_ID>

有不同的名字

如何任何文本中，但沒有標籤本身的匹配匹配多個這樣的標籤？

編輯：我應該再澄清。對我而言，分組然後使用該組以避免替換內部文本不起作用，因爲當我將其他標記添加到表達式時，組編號對於隨後的匹配是不同的。例如：

"(<Exony_Credit_Card_ID>).+(</Exony_Credit_Card_ID>)|(<Billing_Postcode>).+(</Billing_Postcode>)"

的replaceAll以字符串"$1filtered$2"不起作用，因爲當正則表達式匹配Billing_Postcode其羣體是3和4，而不是1和2

來源

2011-02-11 Boris Hamanov

你能不能簡單地使用XML解析器？ – 2011-02-11 10:48:43

不，文本是混合在XML和其他文本之間，它是一個日誌文件 – 2011-02-11 10:51:33

String resultString = subjectString.replaceAll(
    "(?x) # (multiline regex): Match...\n" + 
    "<(Exony_Credit_Card_ID|Billing_Postcode)> # one of these opening tags\n" + 
    "[^<>]* # Match whatever is contained within\n" + 
    "</\\1> # Match corresponding closing tag", 
    "<$1>filtered</$1>");

來源

2011-02-11 11:00:25

我還沒調試此代碼，但你應該使用這樣的東西：

Pattern p = Pattern.compile("<\\w+>([^<]*)<\\w+>"); 
Matcher m = p.matcher(str); 
if (m.find()) { 
    String tagContent = m.group(1); 
}

我希望這是一個好的開始。

來源

2011-02-11 10:54:44 AlexR

我建議你先檢查你發佈的代碼片段：你的代碼甚至沒有編譯。 – 2011-02-11 10:59:06

對不起，我現在沒有IDE。我在家。感謝您修復我的代碼。據我瞭解，你添加了缺少的反斜槓。 – AlexR 2011-02-11 11:00:58

在你的情況，我會使用這樣的：

(?<=<(Exony_Credit_Card_ID|tag1|tag2)>)(\\d+)(?=</(Exony_Credit_Card_ID|tag1|tag2)>)

然後用filtered更換比賽，因爲標籤從返回匹配排除。由於您的目標是隱藏敏感數據，因此最好保持安全並使用「激進」的匹配，儘可能匹配敏感數據，即使有時不匹配。

如果數據包含其他字符，如空格，斜線，短劃線等，您可能需要調整標記內容匹配器（\\d+）。

來源

2011-02-11 11:03:10 mdrg

我會用這樣的：

private static final Pattern PAT = Pattern.compile("<(\\w+)>(.*?)</\\1>"); 

private static String replace(String s, Set<String> toReplace) { 
    Matcher m = PAT.matcher(s); 
    if (m.matches() && toReplace.contains(m.group(1))) { 
     return '<' + m.group(1) + '>' + "filtered" + "</" + m.group(1) + '>'; 
    } 
    return s; 
}

來源

2011-02-11 11:07:53 proactif

我知道你說的是依賴於組號碼並不在你的情況做......但我真的不能看到。你能不能使用這樣的東西：

xmlString.replaceAll("<(Exony_Credit_Card_ID|tag2|tag3)>([^<]+)</(\\1)>", "<$1>filtered</$1>");

？這適用於我用作測試的基本樣本。

編輯：只是分解：

"<(Exony_Credit_Card_ID|tag2|tag3)>" + // matches the tag itself 
"([^<]+)" + // then anything in between the opening and closing of the tag 
"</(\\1)>" // and finally the end tag corresponding to what we matched as the first group (Exony_Credit_Card_ID, tag1 or tag2) 

"<$1>" + // Replace using the first captured group (tag name) 
"filtered" + // the "filtered" text 
"</$1>" // and the closing tag corresponding to the first captured group

來源

2011-02-11 11:56:26 Kellindil

正則表達式，匹配特定XML標籤的內容，但沒有標籤本身

回答

相關問題