字符串分割到三列使用正則表達式

我有串象下面這樣：字符串分割到三列使用正則表達式

rta_geo5: 09/24/14 15:10:38 - Reset_count = 6 
rta_geo5: 09/24/14 15:10:38 - restarting 
rta_geo5: 09/24/14 15:10:38 - memory allocation: 3500 lines

我的目標是這個字符串分割成三列，所以我可以把這個到數據庫表：

------------------------------------------------------------- 
    | COL1  |  COL 2  | COL 3      | 
    ------------------------------------------------------------- 
    | rta_geo5 | 09/24/14 15:10:38 |Reset_count = 6    | 
    ------------------------------------------------------------- 
    |rta_geo5 | 09/24/14 15:10:38 |restarting     | 
    ------------------------------------------------------------- 
    | rta_geo5 | 09/24/14 15:10:38 |memory allocation: 3500 lines | 
    -------------------------------------------------------------

將使用以下語句可能嗎？

string[] substrings = Regex.Split(input, pattern);

我只是需要適當的正則表達式。

來源

2014-09-25 ironcurtain

您是否試圖自己構建模式？它是如何去的？ – Utkanos 2014-09-25 11:19:16

你想如何區分'rta_geo5：'和'allocation：'？你想用什麼嚴格的規則來拆分？ – 2014-09-25 11:30:52

這看起來可能是固定的寬度。如果是這樣，我個人只是拔出所需的子串。 – juharr 2014-09-25 11:47:37

而是分裂的，你可以使用named groups in regex

模式：

Regex ptrn = new Regex(@"^(?<col1>[^:]+):\s+(?<col2>\d{2}/\d{2}/\d{2} \d{2}:\d{2}:\d{2})\s+-\s+(?<col3>[^\r\n]+?)\s*$", 
    RegexOptions.ExplicitCapture|RegexOptions.IgnoreCase|RegexOptions.Multiline);

用法：

string s = @"rta_geo5: 09/24/14 15:10:38 - Reset_count = 6 
rta_geo5: 09/24/14 15:10:38 - restarting 
rta_geo5: 09/24/14 15:10:38 - memory allocation: 3500 lines"; 

var matches = ptrn.Matches(s);

訪問：

matches.OfType<Match>() 
    .Select(match => new string[] 
     { 
     match.Groups["col1"].Value, 
     match.Groups["col2"].Value, 
     match.Groups["col3"].Value 
     }) 
    .ToList().ForEach(a=>System.Console.WriteLine(string.Join("\t|\t",a)));

或者：

foreach (Match match in matches) 
     { 
      string col1 = match.Groups["col1"].Value; 
      string col2 = match.Groups["col2"].Value; 
      string col3 = match.Groups["col3"].Value; 
      System.Console.WriteLine(col1 + "\t|\t" + col2 + "\t|\t" + col3); 
     }

輸出：

rta_geo5 | 09/24/14 15:10:38 | Reset_count = 6 
rta_geo5 | 09/24/14 15:10:38 | restarting 
rta_geo5 | 09/24/14 15:10:38 | memory allocation: 3500 lines

來源

2014-09-25 11:56:40 Arie

這對我有用。謝謝！ – ironcurtain 2014-09-25 12:40:29

分裂這個：

(?:(?<=geo5):\s|(?<=\d{2}:\d{2}:\d{2})\s-\s)

演示在這裏：

http://regex101.com/r/xF7iD7/1

來源

2014-09-25 11:21:00 aelor

我不會用正則表達式（或String.Split）對於這一點，但在這裏你分析每一行的循環。我還會使用自定義類映射到數據庫表以增加可重用性和可重用性。

類（簡體）：

public class Data 
{ 
    public string Token1 { get; set; } // use a meaningful name 
    public string Token2 { get; set; } // use a meaningful name 
    public DateTime Date { get; set; } // use a meaningful name 

    public override string ToString() 
    { 
     return string.Format("Token1:[{0}] Date:[{1}] Token2:[{2}]", 
      Token1, 
      Date.ToString("MM/dd/yy HH:mm:ss", CultureInfo.InvariantCulture), 
      Token2); 
    } 
}

您的樣本字符串：

：

string data = @"rta_geo5: 09/24/14 15:10:38 - Reset_count = 6 
rta_geo5: 09/24/14 15:10:38 - restarting 
rta_geo5: 09/24/14 15:10:38 - memory allocation: 3500 lines";

現在你可以使用普通字符串的方法來解析文本到List<Data>使用這個循環

string[] lines = data.Split(new[] { Environment.NewLine }, StringSplitOptions.RemoveEmptyEntries); 
List<Data> allData = new List<Data>(); 
foreach (string line in lines) 
{ 
    string token1 = null, token2 = null; 
    DateTime dt; 
    int firstColonIndex = line.IndexOf(": "); 
    if (firstColonIndex >= 0) 
    { 
     token1 = line.Remove(firstColonIndex); 
     firstColonIndex += 2; // start next search after first token to find DateTime 
     int indexOfMinus = line.IndexOf(" - ", firstColonIndex); 
     if (indexOfMinus >= 0) 
     { 
      string datePart = line.Substring(firstColonIndex, indexOfMinus - firstColonIndex); 
      if (DateTime.TryParseExact(datePart, "MM/dd/yy HH:mm:ss", CultureInfo.InvariantCulture, DateTimeStyles.None, out dt)) 
      { 
       indexOfMinus += 3; // start next search after DateTime to get last token 
       token2 = line.Substring(indexOfMinus); 
       Data d = new Data { Token1 = token1, Token2 = token2, Date = dt }; 
       allData.Add(d); 
      } 
     } 
    } 
}

測試：

foreach (Data d in allData) 
    Console.WriteLine(d.ToString()); 

Token1:[rta_geo5] Date:[09/24/14 15:10:38] Token2:[Reset_count = 6] 
Token1:[rta_geo5] Date:[09/24/14 15:10:38] Token2:[restarting] 
Token1:[rta_geo5] Date:[09/24/14 15:10:38] Token2:[memory allocation: 3500 lines]

此方法比其他方法更詳細但更有效/可維護。它還允許記錄異常或使用其他方法解析它。

來源

2014-09-25 11:49:24

不知道什麼是錯的，但在我的PC上輸出如下： row1：Token1：[data1] Date：[date] Token2：[data2 row2：data3 date data3] – ironcurtain 2014-09-25 12:43:26

@ironcurtain：我不知道。你有沒有使用他的樣本數據（'字符串數據= @ ...'）？我再次測試了代碼，它正確地顯示了上面的結果。你的string []'lines'包含了什麼？你有沒有複製粘貼換行符？ – 2014-09-25 12:45:33

我認爲有一個問題，因爲字符串是從UNIX系統中檢索的，因爲我檢查了一些行沒有斷行。我決定將文件複製到本地計算機，然後拆分這些列。我沒有測試你的解決方案，但我認爲它會起作用。 – ironcurtain 2014-10-03 08:54:48

好了，有一個思考這個，不知道這是100％，但嘗試：

(rta_geo5): (.*?) - (.*)

應根據需要將其分成3組。但是，它假設前導標識符始終爲(rta_geo5)。

[編輯] -I通知裁判在線服務的正則表達式的答案之一，所以你可以嘗試使用內部的我正則表達式：http://regex101.com/r/xF7iD7/1（對不起，沒有賬號還存在 - 但會馬上創造） - 同樣，關於rta_geo5塊，你當然可以去完全本地與

(.*): (.*) - (.*)

看看它是如何工作無論哪種方式

來源

2014-09-25 11:55:58

字符串分割到三列使用正則表達式

回答

相關問題