正則表達式刪除特殊字符，同時保留有效的電子郵件格式

我在C＃中使用它。我開始用這種格式的電子郵件般的字符串：正則表達式刪除特殊字符，同時保留有效的電子郵件格式

employee[any characters]@company[any characters].com

我想從[任意字符]片剝離非字母數字。

例如，我想這"[email protected] r&a*[email protected]@company98 ';99..com"

成爲這個"[email protected]"

這個表達式只是通吃特價了，但我想公司和一個之前離開一個@。之前com。所以我需要表達式來忽略或掩蓋員工，@公司和.com件......只是不知道該怎麼做。

var regex = new Regex("[^0-9a-zA-Z]"); //whitelist the acceptables, remove all else.

來源

2014-10-29 chrismat

正則表達式本身會做你所需要的東西（除了它會刪除'@'和'''） - 它只取決於你的編程語言你應該如何使用它。從'var'我認爲JavaScript？ – dognose 2014-10-29 20:53:02

爲什麼它應該成爲「[email protected]」而不是「[email protected].m32company9899.com」？ – Oriol 2014-10-29 20:56:18

@Oriol OP的第二和第三行應該回答你的問題......這始終是開始的格式，我們想用它做什麼。「@company」始終是電子郵件域的開始。 – chrismat 2014-10-29 21:00:39

您可以使用下面的正則表達式：

(?:\W)(?!company|com)

它將取代任何特殊字符，除非它後跟company（所以@company仍將）或com（所以.com仍將）：

[email protected] r&a*[email protected]@company98 ';99..com

將成爲

[email protected]

參見：http://regex101.com/r/fY8jD7/2

請注意，您所需要的g改性劑來代替這種不必要的字符出現的所有。這是默認在C＃中，所以你只可以用一個簡單的Regex.Replace()：

https://dotnetfiddle.net/iTeZ4F

更新：

OFC。正則表達式(?:\W)(?!com)就足夠了 - 但它仍然會留下像#com或~companion這樣的部分，因爲它們也匹配。所以這是仍然不是保證輸入 - 或者說轉換 - 是100％有效。你應該考慮簡單地拋出一個驗證錯誤，而不是試圖消毒輸入以符合你的需求。

即使你設法處理這個箱子以及 - 做什麼，如果@company或.com出現兩次？

來源

2014-10-29 21:47:59 dognose

感謝dognose ......在這個時候，我知道「@company」和.com不會出現兩次......如果源數據到達那一點，我們將與這些人談一談（=感謝一堆！ – chrismat 2014-10-29 22:31:48

您可以簡化您的正則表達式，並在那裏\w意味着所有的字母，數字和下劃線，\W是\w否定版本由

tmp = Regex.Replace(n, @"\W+", "");

更換。通常，最好創建允許使用字符的白名單，而不是試圖預測所有不允許的符號。

來源

2014-10-29 21:26:18

有沒有辦法讓正則表達式忽略字符串「員工」，「@公司」和「.com」？ – chrismat 2014-10-29 21:40:57

我可能會寫類似：

（忽略大小寫，如果你需要區分大小寫請評論）。

DotNetFiddle Example

using System; 
using System.Linq; 

public class Program 
{ 
    public static void Main() 
    { 
     var email = "[email protected] r&a*[email protected]@company98 ';99..com"; 

     var result = GetValidEmail(email); 

     Console.WriteLine(result); 
    } 


    public static string GetValidEmail(string email) 
    { 
     var result = email.ToLower(); 

     // Does it contain everything we need? 
     if (email.StartsWith("employee") 
      && email.EndsWith(".com") 
      && email.Contains("@company")) 
     { 
     // remove beginning and end. 
     result = result.Substring(8, result.Length - 13); 
     // remove @company 
     var split = result.Split(new string[] { "@company" }, 
      StringSplitOptions.RemoveEmptyEntries); 

     // validate we have more than two (you may not need this) 
     if (split.Length != 2) 
     { 
      throw new ArgumentException("Invalid Email."); 
     } 

     // recreate valid email 
     result = "employee" 
      + new string (split[0].Where(c => char.IsLetterOrDigit(c)).ToArray()) 
      + "@company" 
      + new string (split[1].Where(c => char.IsLetterOrDigit(c)).ToArray()) 
      + ".com"; 

     } 
     else 
     { 
     throw new ArgumentException("Invalid Email."); 
     } 

     return result; 
    } 
}

結果

[email protected]

來源

2014-10-29 21:34:04

我希望不得不避免這樣的事情，但如果正則表達式無法處理這種模式，我想它將不得不這樣做。 thx – chrismat 2014-10-29 21:50:39

只是一個選項。 – 2014-10-29 21:56:50

@dognose發表了精彩的正則表達式的解決方案。我會把我的答案留在這裏作爲參考，但我會和他一起去，因爲它更短/更乾淨。

var companyName = "company"; 
var extension = "com"; 
var email = "[email protected] r&a*[email protected]@company98 ';99..com"; 

var tempEmail = Regex.Replace(email, @"\W+", ""); 

var companyIndex = tempEmail.IndexOf(companyName); 
var extIndex = tempEmail.LastIndexOf(extension); 

var fullEmployeeName = tempEmail.Substring(0, companyIndex); 
var fullCompanyName = tempEmail.Substring(companyIndex, extIndex - companyIndex); 

var validEmail = fullEmployeeName + "@" + fullCompanyName + "." + extension;

來源

2014-10-29 21:37:57

不是真的......我們肯定知道給定的格式將是員工[任何字符] @公司[任何字符] .com。我們不知道[任何字符]中的內容，我們只需要從[任何字符]部分刪除非字母數字字符。 – chrismat 2014-10-29 21:48:36

我認爲我已經修復它來做你現在要求的。 – 2014-10-29 21:54:34

你想要做的是，儘管可能，使用一個單一的正則表達式模式有點複雜。您可以將這種情況分解成更小的步驟。其中一種方法是提取Username和Domain組（基本上就是您所描述的[any character]），「修復」每個組，並將其替換爲原始組。事情是這樣的：

// Original input to transform. 
string input = @"[email protected] r&a*[email protected]@company98 ';99..com"; 

// Regular expression to find and extract "Username" and "Domain" groups, if any. 
var matchGroups = Regex.Match(input, @"employee(?<UsernameGroup>(.*))@company(?<DomainGroup>(.*)).com"); 

string validInput = input; 

// Get the username group from the list of matches. 
var usernameGroup = matchGroups.Groups["UsernameGroup"]; 

if (!string.IsNullOrEmpty(usernameGroup.Value)) 
{ 
    // Replace non-alphanumeric values with empty string. 
    string validUsername = Regex.Replace(usernameGroup.Value, "[^a-zA-Z0-9]", string.Empty); 

    // Replace the the invalid instance with the valid one. 
    validInput = validInput.Replace(usernameGroup.Value, validUsername); 
} 

// Get the domain group from the list of matches. 
var domainGroup = matchGroups.Groups["DomainGroup"]; 

if (!string.IsNullOrEmpty(domainGroup.Value)) 
{ 
    // Replace non-alphanumeric values with empty string. 
    string validDomain = Regex.Replace(domainGroup.Value, "[^a-zA-Z0-9]", string.Empty); 

    // Replace the the invalid instance with the valid one. 
    validInput = validInput.Replace(domainGroup.Value, validDomain); 
} 

Console.WriteLine(validInput);

將輸出[email protected]。

來源

2014-10-29 21:53:36 PoweredByOrange

正則表達式刪除特殊字符，同時保留有效的電子郵件格式

回答

相關問題