2011-11-22 64 views
0

我正試圖找到一個更好的解決方案來將純文本(但具有每個字段的預定義長度)轉換爲業務實體。 例如,輸入文本可以是「Testuser new york 10018」,前11個字符表示用戶名,下12個字符表示城市,接下來5個字符表示郵政編碼。 輸入文本可能會很長像1000個字符,它代表在一個實體的多個屬性將結構化文本轉換爲商業實體的更好方法

任何幫助appreciated..Thanks

我嘗試以下方法

  1. 定義的XML結構,其可以被反序列化到商業實體

  2. 使用xslt導航到每個節點並通過在輸入文本上使用子字符串函數來填充xml元素值。

  3. 一旦填充了xml,就將xml反序列化爲實體。

但我覺得上面的方法可能不是規模能夠與多個XSLT文件加載到不同的輸入轉換成相應個XML

+0

在你的榜樣,前8個字符的用戶名,在接下來的8個字符是城市,而未來5個字符郵政編碼...(不,這不僅僅是空間標準化。)這是實際的格式嗎? – Ryan

+1

定義'更好'。比什麼好?你有什麼嘗試?你所嘗試的什麼都不起作用? –

+0

@Chris,我用我目前的方法更新了這個問題 – testuser

回答

2

一個很好的和優雅的方式可能是使用正則表達式,在System.Text.RegularExpressions命名空間,所以是這樣的:

static Regex inputParser = new Regex("(.{11})(.{12})(.{5})", RegexOptions.Compiled"); 

foreach(Match m in inputParser.Matches(yourInput)) { 
    BusinessEntity e = new BusinessEntity(); 
    e.Username = m.Groups(1).Value.TrimEnd(); // Remove spaces from the end; I take it that's what they'll be padded with 
    e.City = m.Groups(2).Value.TrimEnd(); 
    e.ZipCode = m.Groups(3).Value; 
    myListOfBusinessEntities.Add(e); 
} 
+0

這裏假定所有的字段都是字符串,當然...... –

+0

@ChrisShain嗯,是的。如果有不同類型的字段,應該很容易修改。 – Ryan

0

如果你正面臨一個情況,你可以簡單地用一個方法至極接收文本行,並返回一個新的實體編寫一個簡單的類。

如果您用空白填充行,具有固定長度的行,使用System.Text.Encoding類和GetString方法的二進制閱讀器可以生成更快的解決方案。

0

基於問題的細化,我推斷你對不同的輸入有多種不同的格式。這是IFormatter的一個實現,它應該可以幫助您實現這一目標。請注意,這是在幾個不同的方式,哈克打破,並附帶任何排序保證:

void Test() 
{ 
    var serializer = new FixedWidthSerializer<MyClass>(); 
    var ms = new MemoryStream(); 
    serializer.Serialize(ms, new MyClass { Age = 30, FirstName = "John", LastName = "Doe"}); 
    ms.Position = 0; 
    var newMyClass = (MyClass)serializer.Deserialize(ms); 
} 

[Serializable] 
private class MyClass 
{ 
    public String FirstName { get; set; } 
    public String LastName; 
    public Int32 Age { get; set; } 
} 

public class FixedWidthSerializer<T> : IFormatter 
{ 
    private readonly FixedWidthFieldDefinition[] _fieldDefinition; 

    public FixedWidthSerializer() 
     : 
     this(FormatterServices.GetSerializableMembers(typeof(T)).Select(sm=>new FixedWidthFieldDefinition(sm.Name, 100)).ToArray()) 
    { } 

    public FixedWidthSerializer(FixedWidthFieldDefinition[] fieldDefinition) 
    { 
     if (fieldDefinition == null) throw new ArgumentNullException("fieldDefinition"); 
     _fieldDefinition = fieldDefinition; 
     Context = new StreamingContext(StreamingContextStates.All);    
    } 

    public class FixedWidthFieldDefinition 
    { 
     public String FieldName { get; protected set; } 
     public Int32 CharLength { get; protected set; } 

     public FixedWidthFieldDefinition(String fieldName, Int32 charLength) 
     { 
      FieldName = fieldName; 
      CharLength = charLength; 
     } 
    } 

    public object Deserialize(Stream serializationStream) 
    { 
     var streamReader = new StreamReader(serializationStream); 
     var textLine = streamReader.ReadLine(); 

     if (textLine == null) 
      throw new SerializationException("Ran out of text!"); 

     var obj = FormatterServices.GetUninitializedObject(typeof (T)); 
     var memberDictionary = FormatterServices.GetSerializableMembers(obj.GetType(), Context).ToDictionary(mi => mi.Name); 

     var offset = 0; 
     foreach (var fieldDef in _fieldDefinition) 
     { 
      if (offset + fieldDef.CharLength > textLine.Length) 
       throw new SerializationException("Line was too short!"); 

      // Read the current field and increase the offset 
      var fieldStringValue = textLine.Substring(offset, fieldDef.CharLength); 
      offset += fieldDef.CharLength; 

      MemberInfo memberInfo; 

      if (!memberDictionary.TryGetValue(fieldDef.FieldName, out memberInfo)) 
       throw new SerializationException("You asked for the member '" + fieldDef.FieldName + "', but it doesn't exist on type '" + typeof (T) + "'"); 

      var memberAsField = memberInfo as FieldInfo; 

      if (memberAsField != null) 
       memberAsField.SetValue(obj, Convert.ChangeType(fieldStringValue.TrimEnd(), memberAsField.FieldType)); 
      else 
       throw new SerializationException("I don't know what to make of the property '" + fieldDef.FieldName + "'"); 
     } 
     return obj; 
    } 

    public void Serialize(Stream serializationStream, object graph) 
    { 
     var serializableMembers = FormatterServices.GetSerializableMembers(graph.GetType()); 
     var membersToSerialize = _fieldDefinition.Select(fd => serializableMembers.First(sm => sm.Name == fd.FieldName)).ToArray(); 
     var objectData = FormatterServices.GetObjectData(graph, membersToSerialize); 
     var sb = new StringBuilder(_fieldDefinition.Sum(fd => fd.CharLength)); 
     for (var i = 0; i < _fieldDefinition.Length; i++) 
      sb.Append(((String) Convert.ChangeType(objectData[i], typeof (String))).PadRight(_fieldDefinition[i].CharLength), 0, _fieldDefinition[i].CharLength); 
     var sw = new StreamWriter(serializationStream); 
     sw.WriteLine(sb.ToString()); 
     sw.Flush(); 
    } 

    public ISurrogateSelector SurrogateSelector { get; set; } 

    public SerializationBinder Binder { get; set; } 

    public StreamingContext Context { get; set; } 
} 
+0

感謝您的幫助,我會嘗試上述方法,看看它是如何發展的 – testuser

相關問題