2011-11-01 154 views
8

我想將包含遞歸字符串數組的字符串轉換爲深度數組。解析包含數組的字符串

實施例:

StringToArray("[a, b, [c, [d, e]], f, [g, h], i]") == ["a", "b", "[c, [d, e]]", "f", "[g, h]", "i"] 

似乎相當簡單。但是,我來自功能背景,並不熟悉.NET Framework標準庫,所以每次(我從頭開始像3次)最終只是簡單難看的代碼。我最近的實施是here。正如你所看到的,它很醜陋。

那麼,什麼是C#的方式來做到這一點?

+1

+1一個具有挑戰性的問題。不過,我認爲這通常是codereview:codereview.stackexchange.com/faq#questions。 –

回答

5

@ojlovecd使用正則表達式有一個很好的答案。
但是,他的答案過於複雜,所以這裏是我的類似,更簡單的答案。

public string[] StringToArray(string input) { 
    var pattern = new Regex(@" 
     \[ 
      (?: 
      \s* 
       (?<results>(?: 
       (?(open) [^\[\]]+ | [^\[\],]+ ) 
       |(?<open>\[) 
       |(?<-open>\]) 
       )+) 
       (?(open)(?!)) 
      ,? 
      )* 
     \] 
    ", RegexOptions.IgnorePatternWhitespace); 

    // Find the first match: 
    var result = pattern.Match(input); 
    if (result.Success) { 
     // Extract the captured values: 
     var captures = result.Groups["results"].Captures.Cast<Capture>().Select(c => c.Value).ToArray(); 
     return captures; 
    } 
    // Not a match 
    return null; 
} 

使用此代碼,你會看到StringToArray("[a, b, [c, [d, e]], f, [g, h], i]")將返回以下陣列:["a", "b", "[c, [d, e]]", "f", "[g, h]", "i"]

有關用於匹配平衡牙套的平衡組的更多信息,請參閱Microsoft's documentation

更新
按照意見,如果你也想平衡引號,這裏是一個可能的修改。 (請注意,在C#中的"轉義爲"")我還添加了圖案的描述,以幫助澄清:

var pattern = new Regex(@" 
     \[ 
      (?: 
      \s* 
       (?<results>(?:    # Capture everything into 'results' 
        (?(open)    # If 'open' Then 
         [^\[\]]+   # Capture everything but brackets 
         |     # Else (not open): 
         (?:     # Capture either: 
          [^\[\],'""]+ #  Unimportant characters 
          |    # Or 
          ['""][^'""]*?['""] # Anything between quotes 
         ) 
        )      # End If 
        |(?<open>\[)   # Open bracket 
        |(?<-open>\])   # Close bracket 
       )+) 
       (?(open)(?!))    # Fail while there's an unbalanced 'open' 
      ,? 
      )* 
     \] 
    ", RegexOptions.IgnorePatternWhitespace); 
+0

這是一個夢幻般的解決方案。 :) – ojlovecd

+0

謝謝,希望我沒有偷你的雷霆:) –

+0

當然不是。只有討論和改進。 :) – ojlovecd

0

老實說,我只是寫這個方法在F#程序集中,因爲它可能更容易。如果你看看C#中的JavaScriptSerializer實現(使用類似dotPeek或反射器的反編譯器),你可以看到數組解析代碼對於JSON中的類似數組是多麼的混亂。當然,這必須處理更多不同的令牌,但你明白了。

這是他們的DeserializeList的實現,比它通常作爲它的dotPeek的反編譯版本,而不是原來的,但你明白了。 DeserializeInternal將遞歸到子列表。

private IList DeserializeList(int depth) 
{ 
    IList list = (IList) new ArrayList(); 
    char? nullable1 = this._s.MoveNext(); 
    if (((int) nullable1.GetValueOrDefault() != 91 ? 1 : (!nullable1.HasValue ? 1 : 0)) != 0) 
    throw new ArgumentException(this._s.GetDebugString(AtlasWeb.JSON_InvalidArrayStart)); 
    bool flag = false; 
    char? nextNonEmptyChar; 
    char? nullable2; 
    do 
    { 
    char? nullable3 = nextNonEmptyChar = this._s.GetNextNonEmptyChar(); 
    if ((nullable3.HasValue ? new int?((int) nullable3.GetValueOrDefault()) : new int?()).HasValue) 
    { 
     char? nullable4 = nextNonEmptyChar; 
     if (((int) nullable4.GetValueOrDefault() != 93 ? 1 : (!nullable4.HasValue ? 1 : 0)) != 0) 
     { 
     this._s.MovePrev(); 
     object obj = this.DeserializeInternal(depth); 
     list.Add(obj); 
     flag = false; 
     nextNonEmptyChar = this._s.GetNextNonEmptyChar(); 
     char? nullable5 = nextNonEmptyChar; 
     if (((int) nullable5.GetValueOrDefault() != 93 ? 0 : (nullable5.HasValue ? 1 : 0)) == 0) 
     { 
      flag = true; 
      nullable2 = nextNonEmptyChar; 
     } 
     else 
      goto label_8; 
     } 
     else 
     goto label_8; 
    } 
    else 
     goto label_8; 
    } 
    while (((int) nullable2.GetValueOrDefault() != 44 ? 1 : (!nullable2.HasValue ? 1 : 0)) == 0); 
    throw new ArgumentException(this._s.GetDebugString(AtlasWeb.JSON_InvalidArrayExpectComma)); 
label_8: 
    if (flag) 
    throw new ArgumentException(this._s.GetDebugString(AtlasWeb.JSON_InvalidArrayExtraComma)); 
    char? nullable6 = nextNonEmptyChar; 
    if (((int) nullable6.GetValueOrDefault() != 93 ? 1 : (!nullable6.HasValue ? 1 : 0)) != 0) 
    throw new ArgumentException(this._s.GetDebugString(AtlasWeb.JSON_InvalidArrayEnd)); 
    else 
    return list; 
} 

雖然在C#中,遞歸解析並沒有像在F#中一樣得到管理。

0

這樣做沒有真正的「標準」方式。請注意,如果你想考慮所有的可能性,實現可能會非常混亂。我會推薦一些遞歸這樣的:

private static IEnumerable<object> StringToArray2(string input) 
    { 
     var characters = input.GetEnumerator(); 
     return InternalStringToArray2(characters); 
    } 

    private static IEnumerable<object> InternalStringToArray2(IEnumerator<char> characters) 
    { 
     StringBuilder valueBuilder = new StringBuilder(); 

     while (characters.MoveNext()) 
     { 
      char current = characters.Current; 

      switch (current) 
      { 
       case '[': 
        yield return InternalStringToArray2(characters); 
        break; 
       case ']': 
        yield return valueBuilder.ToString(); 
        valueBuilder.Clear(); 
        yield break; 
       case ',': 
        yield return valueBuilder.ToString(); 
        valueBuilder.Clear(); 
        break; 
       default: 
        valueBuilder.Append(current); 
        break; 
      } 

雖然你不侷限於遞歸性總是可以回落到一個單一的方法類似

private static IEnumerable<object> StringToArray1(string input) 
    { 
     Stack<List<object>> levelEntries = new Stack<List<object>>(); 
     List<object> current = null; 
     StringBuilder currentLineBuilder = new StringBuilder(); 

     foreach (char nextChar in input) 
     { 
      switch (nextChar) 
      { 
       case '[': 
        levelEntries.Push(current); 
        current = new List<object>(); 
        break; 
       case ']': 
        current.Add(currentLineBuilder.ToString()); 
        currentLineBuilder.Clear(); 
        var last = current; 
        if (levelEntries.Peek() != null) 
        { 
         current = levelEntries.Pop(); 
         current.Add(last); 
        } 
        break; 
       case ',': 
        current.Add(currentLineBuilder.ToString()); 
        currentLineBuilder.Clear(); 
        break; 
       default: 
        currentLineBuilder.Append(nextChar); 
        break; 
      } 
     } 

     return current; 
    } 

無論味道很好聞到你

2

用正則表達式,它可以解決你的問題:

static string[] StringToArray(string str) 
{ 
    Regex reg = new Regex(@"^\[(.*)\]$"); 
    Match match = reg.Match(str); 
    if (!match.Success) 
     return null; 
    str = match.Groups[1].Value; 
    List<string> list = new List<string>(); 
    reg = new Regex(@"\[[^\[\]]*(((?'Open'\[)[^\[\]]*)+((?'-Open'\])[^\[\]]*)+)*(?(Open)(?!))\]"); 
    Dictionary<string, string> dic = new Dictionary<string, string>(); 
    int index = 0; 
    str = reg.Replace(str, m => 
    { 
     string temp = "ojlovecd" + (index++).ToString(); 
     dic.Add(temp, m.Value); 
     return temp; 
    }); 
    string[] result = str.Split(','); 
    for (int i = 0; i < result.Length; i++) 
    { 
     string s = result[i].Trim(); 
     if (dic.ContainsKey(s)) 
      result[i] = dic[s].Trim(); 
     else 
      result[i] = s; 
    } 
    return result; 
} 
+0

我也認爲正則表達式是一種方式,但這不起作用,因爲您需要捕捉「平衡」大括號。 –

+0

@ScottRippey嗨,斯科特,我修改了我的代碼,請嘗試。 – ojlovecd

+0

看起來不錯。需要一些清理,但我會假設它的工作原理:)對於任何對這些「平衡組」感興趣的人,特別是對於均衡的大括號匹配,你應該看看[微軟關於「平衡組定義」的文檔。](http ://msdn.microsoft.com/en-us/library/bs2twtah.aspx#balancing_group_definition) –

0
using System; 
using System.Text; 
using System.Text.RegularExpressions; 
using Microsoft.VisualBasic.FileIO; //Microsoft.VisualBasic.dll 
using System.IO; 

public class Sample { 
    static void Main(){ 
     string data = "[a, b, [c, [d, e]], f, [g, h], i]"; 
     string[] fields = StringToArray(data); 
     //check print 
     foreach(var item in fields){ 
      Console.WriteLine("\"{0}\"",item); 
     } 
    } 
    static string[] StringToArray(string data){ 
     string[] fields = null; 
     Regex innerPat = new Regex(@"\[\s*(.+)\s*\]"); 
     string innerStr = innerPat.Matches(data)[0].Groups[1].Value; 
     StringBuilder wk = new StringBuilder(); 
     var balance = 0; 
     for(var i = 0;i<innerStr.Length;++i){ 
      char ch = innerStr[i]; 
      switch(ch){ 
      case '[': 
       if(balance == 0){ 
        wk.Append('"'); 
       } 
       wk.Append(ch); 
       ++balance; 
       continue; 
      case ']': 
       wk.Append(ch); 
       --balance; 
       if(balance == 0){ 
        wk.Append('"'); 
       } 
       continue; 
      default: 
       wk.Append(ch); 
       break; 
      } 
     } 
     var reader = new StringReader(wk.ToString()); 
     using(var csvReader = new TextFieldParser(reader)){ 
      csvReader.SetDelimiters(new string[] {","}); 
      csvReader.HasFieldsEnclosedInQuotes = true; 
      fields = csvReader.ReadFields(); 
     } 
     return fields; 
    } 
}