2016-08-18 51 views
0

好的,我會盡我所能解釋這一點。我編寫了一個應用程序,它使用SQL表來定義一個固定寬度的數據源結構(所以,標題,開始索引,字段長度等)。當我的應用程序運行時,它會查詢此表並使用DataColumn對象創建一個DataTable對象(稱爲finalDT),並保留ColumnName = header。然後,我在這個表中附加一組存在於我們使用的每個數據源中的DataColumn對象(我​​傾向於稱之爲派生列)。我還創建一個主鍵字段,它是一個自動遞增整數。最初,我推出了自己的解決方案來讀取固定寬度的文件,但我試圖將其轉換爲使用FileHelper。主要是,我正在考慮整合它,以便我可以訪問FileHelper可以解析的其他文件類型(CSV,Excel等)。使用FileHelpers.Dynamic,讀取一個固定寬度的文件並上傳到SQL

現在,我的問題。使用FileHelper.Dynamic,我能夠創建使用以下方法的FileHelperEngine對象:

private static FileHelperEngine GetFixedWidthFileClass(bool ignore) 
{ 
    singletonArguments sArgs = singletonArguments.sArgs; 
    singletonSQL sSQL = singletonSQL.sSQL; 
    List<string> remove = new List<string>(); 

    FixedLengthClassBuilder flcb = new FixedLengthClassBuilder(sSQL.FixedDataDefinition.DataTableName); 
    flcb.IgnoreFirstLines = 1; 
    flcb.IgnoreLastLines = 1; 
    flcb.IgnoreEmptyLines = true; 

    foreach (var dcs in sSQL.FixedDataDefinition.Columns) 
    { 
     flcb.AddField(dcs.header, Convert.ToInt32(dcs.length), "String"); 

     if (ignore && dcs.ignore) 
     { 
      flcb.LastField.FieldValueDiscarded = true; //If we want to ignore a column, this is how to do it. Would like to incorporate this. 
      flcb.LastField.Visibility = NetVisibility.Protected; 
     } 
     else 
     { 
      flcb.LastField.TrimMode = TrimMode.Both; 
      flcb.LastField.FieldNullValue = string.Empty; 
     } 
    } 

    return new FileHelperEngine(flcb.CreateRecordClass()); 
} 

sSQL.FixedDataDefinition.Columns是我怎樣存儲用於固定寬度的數據源文件中的字段定義。然後我通過執行生成的DataTable:

DataTable dt = engine.ReadFileAsDT(file); 

其中file是完整路徑到固定寬度的文件和engine是我保持從上面示出的方法GetFixedWidthFileClass()結果。好吧,現在我有一個沒有主鍵和派生列的DataTable。此外,dt中的所有字段都標記爲ReadOnly = true。這是事情變得混亂的地方。

我需要填充dtfinalDT,它需要沒有任何主鍵信息dt沒問題。如果發生這種情況,我可以使用finalDT將我的數據上傳到我的SQL表中。如果這不可能發生,那麼我需要一個finalDT沒有主鍵,但仍然上傳到我的SQL表的方式。 SqlBulkCopy會允許嗎?有另一種方法嗎?

在這一點上,我願意從頭開始,只要我可以使用FileHelper來解析固定寬度的文件,結果存儲到我的SQL表中,我只是沒有看到那裏的路徑。

回答

1

我想通了。這並不漂亮,但這是它的工作原理。基本上,我如何在我的原始帖子中設置我的代碼仍然適用,因爲我在GetFixedWidthFileClass()方法中沒有改變任何內容。然後我不得不添加了兩種方法來獲得finalDT設置正確:

/// <summary> 
///  For a given a datasource file, add all rows to the DataSet and collect Hexdump data 
/// </summary> 
/// <param name="ds"> 
///  The <see cref="System.Data.DataSet" /> to add to 
/// </param> 
/// <param name="file"> 
///  The datasource file to process 
/// </param> 
internal static void GenerateDatasource(ref DataSet ds, ref FileHelperEngine engine, DataSourceColumnSpecs mktgidSpecs, string file) 
{ 
    // Some singleton class instances to hold program data I will need. 
    singletonSQL sSQL = singletonSQL.sSQL; 
    singletonArguments sArgs = singletonArguments.sArgs; 

    try 
    { 
     // Load a DataTable with contents of datasource file. 
     DataTable dt = engine.ReadFileAsDT(file); 

     // Clean up the DataTable by removing columns that should be ignored. 
     DataTableCleanUp(ref dt, ref engine); 

     // ReadFileAsDT() makes all of the columns ReadOnly. Fix that. 
     foreach (DataColumn column in dt.Columns) 
      column.ReadOnly = false; 

     // Okay, now get a Primary Key and add in the derived columns. 
     GenerateDatasourceSchema(ref dt); 

     // Parse all of the rows and columns to do data clean up and assign some custom 
     // values. Add custom values for jobID and serial columns to each row in the DataTable. 
     for (int row = 0; row < dt.Rows.Count; row++) 
     { 
      string version = string.Empty; // The file version 
      bool found = false; // Used to get out of foreach loops once the required condition is found. 

      // Iterate all configured jobs and add the jobID and serial number to each row 
      // based upon match. 
      foreach (JobSetupDetails job in sSQL.VznJobDescriptions.JobDetails) 
      { 
       // Version must match id in order to update the row. Break out once we find 
       // the match to save time. 
       version = dt.Rows[row][dt.Columns[mktgidSpecs.header]].ToString().Trim().Split(new char[] { '_' })[0]; 
       foreach (string id in job.ids) 
       { 
        if (version.Equals(id)) 
        { 
         dt.Rows[row][dt.Columns["jobid"]] = job.jobID; 

         lock (locklist) 
          dt.Rows[row][dt.Columns["serial"]] = job.serial++; 

         found = true; 
         break; 
        } 
       } 
       if (found) 
        break; 
      } 

      // Parse all columns to do data clean up. 
      for (int column = 0; column < dt.Columns.Count; column++) 
      { 
       // This tab character keeps showing up in the data. It should not be there, 
       // but customer won't fix it, so we have to. 
       if (dt.Rows[row][column].GetType() == typeof(string)) 
        dt.Rows[row][column] = dt.Rows[row][column].ToString().Replace('\t', ' '); 
      } 
     } 

     dt.AcceptChanges(); 

     // DataTable is cleaned up and modified. Time to push it into the DataSet. 
     lock (locklist) 
     { 
      // If dt is writing back to the DataSet for the first time, Rows.Count will be 
      // zero. Since the DataTable in the DataSet does not have the table schema and 
      // since dt.Copy() is not an option (ds is referenced, so Copy() won't work), Use 
      // Merge() and use the option MissingSchemaAction.Add to create the schema. 
      if (ds.Tables[sSQL.FixedDataDefinition.DataTableName].Rows.Count == 0) 
       ds.Tables[sSQL.FixedDataDefinition.DataTableName].Merge(dt, false, MissingSchemaAction.Add); 
      else 
      { 
       // If this is not the first write to the DataSet, remove the PrimaryKey 
       // column to avoid duplicate key values. Use ImportRow() rather then .Merge() 
       // since, for whatever reason, Merge() is overwriting ds each time it is 
       // called and ImportRow() is actually appending the row. Ugly, but can't 
       // figure out another way to make this work. 
       dt.PrimaryKey = null; 
       dt.Columns.Remove(dt.Columns[0]); 
       foreach (DataRow dr in dt.Rows) 
        ds.Tables[sSQL.FixedDataDefinition.DataTableName].ImportRow(dr); 
      } 

      // Accept all the changes made to the DataSet. 
      ds.Tables[sSQL.FixedDataDefinition.DataTableName].AcceptChanges(); 
     } 

     // Clean up memory. 
     dt.Clear(); 

     // Log my progress. 
     log.GenerateLog("0038", log.Info 
         , engine.TotalRecords.ToString() + " DataRows successfully added for file:\r\n\t" 
         + file + "\r\nto DataTable " 
         + sSQL.FixedDataDefinition.DataTableName); 
    } 
    catch (Exception e) 
    { 
     // Something bad happened here. 
     log.GenerateLog("0038", log.Error, "Failed to add DataRows to DataTable " 
         + sSQL.FixedDataDefinition.DataTableName 
         + " for file\r\n\t" 
         + file, e); 
    } 
    finally 
    { 
     // Successful or not, get rid of the datasource file to prevent other issues. 
     File.Delete(file); 
    } 
} 

而且這種方法:

/// <summary> 
///  Deletes columns that are not needed from a given <see cref="System.Data.DataTable" /> reference. 
/// </summary> 
/// <param name="dt"> 
///  The <see cref="System.Data.DataTable" /> to delete columns from. 
/// </param> 
/// <param name="engine"> 
///  The <see cref="FileHelperEngine" /> object containing data field usability information. 
/// </param> 
private static void DataTableCleanUp(ref DataTable dt, ref FileHelperEngine engine) 
{ 
    // Tracks DataColumns I need to remove from my temp DataTable, dt. 
    List<DataColumn> removeColumns = new List<DataColumn>(); 

    // If a field is Discarded, then the data was not imported because we don't need this 
    // column. In that case, mark the column for deletion by adding it to removeColumns. 
    for (int i = 0; i < engine.Options.Fields.Count; i++) 
     if (engine.Options.Fields[i].Discarded) 
      removeColumns.Add(dt.Columns[i]); 

    // Reverse the List so changes to dt don't generate schema errors. 
    removeColumns.Reverse(); 

    // Do the deletion. 
    foreach (DataColumn column in removeColumns) 
     dt.Columns.Remove(column); 

    // Clean up memory. 
    removeColumns.Clear(); 
} 

基本上,因爲dsGenerateDatasource方法引用(其中finalDT生活中的數據集),我無法使用dt.Copy()將數據推送到它。我不得不使用Merge()來做到這一點。然後,在我希望使用Merge()的地方,我不得不使用foreach循環和ImportRow(),因爲Merge()覆蓋了finalDT

我不得不解決的其他問題有:

  1. 當我使用ImportRow(),那麼我還需要從dt刪除PrimaryKey否則我得到關於重複鍵錯誤。
  2. FileHelperEngineFileHelpers.Dynamic.FixedLengthClassBuilder在跳過我想忽略的列時出現問題。它或者根本不會承認它們,從而導致我的列偏移量以及數據源文件中數據讀取方式的準確性(使用FieldHidden選項),或者讀取它們並以任何方式創建列,但是不會「 t加載數據(使用FieldValueDiscardedVisibility.Private.Protected選項)。這對我意味着什麼,我必須在呼叫engine.ReadFileAsDT(file)後重復dt,並刪除標記爲Discarded的列。
  3. 由於FileHelper對我的PrimaryKey列或在處理過程中添加到所有數據源的其他派生列沒有任何瞭解,因此我必須將dt傳遞給方法(GenerateDatasourceSchema())才能對此進行排序。該方法基本上只是添加這些列,並確保PrimaryKey是第一列。

其餘的代碼是修復我需要做的列和行。在某些情況下,我爲每行設置一個列的值,而在另一些情況下,我正在清除原始數據中的錯誤(正如它從我的客戶那裏得到的)。

這不太好,我希望找出一條更好的路。如果有人對我是如何做到的,我很樂意聽到它。