使用FileHelpers.Dynamic，讀取一個固定寬度的文件並上傳到SQL

好的，我會盡我所能解釋這一點。我編寫了一個應用程序，它使用SQL表來定義一個固定寬度的數據源結構（所以，標題，開始索引，字段長度等）。當我的應用程序運行時，它會查詢此表並使用DataColumn對象創建一個DataTable對象（稱爲finalDT），並保留ColumnName = header。然後，我在這個表中附加一組存在於我們使用的每個數據源中的DataColumn對象（我傾向於稱之爲派生列）。我還創建一個主鍵字段，它是一個自動遞增整數。最初，我推出了自己的解決方案來讀取固定寬度的文件，但我試圖將其轉換爲使用FileHelper。主要是，我正在考慮整合它，以便我可以訪問FileHelper可以解析的其他文件類型（CSV，Excel等）。使用FileHelpers.Dynamic，讀取一個固定寬度的文件並上傳到SQL

現在，我的問題。使用FileHelper.Dynamic，我能夠創建使用以下方法的FileHelperEngine對象：

private static FileHelperEngine GetFixedWidthFileClass(bool ignore) 
{ 
    singletonArguments sArgs = singletonArguments.sArgs; 
    singletonSQL sSQL = singletonSQL.sSQL; 
    List<string> remove = new List<string>(); 

    FixedLengthClassBuilder flcb = new FixedLengthClassBuilder(sSQL.FixedDataDefinition.DataTableName); 
    flcb.IgnoreFirstLines = 1; 
    flcb.IgnoreLastLines = 1; 
    flcb.IgnoreEmptyLines = true; 

    foreach (var dcs in sSQL.FixedDataDefinition.Columns) 
    { 
     flcb.AddField(dcs.header, Convert.ToInt32(dcs.length), "String"); 

     if (ignore && dcs.ignore) 
     { 
      flcb.LastField.FieldValueDiscarded = true; //If we want to ignore a column, this is how to do it. Would like to incorporate this. 
      flcb.LastField.Visibility = NetVisibility.Protected; 
     } 
     else 
     { 
      flcb.LastField.TrimMode = TrimMode.Both; 
      flcb.LastField.FieldNullValue = string.Empty; 
     } 
    } 

    return new FileHelperEngine(flcb.CreateRecordClass()); 
}

sSQL.FixedDataDefinition.Columns是我怎樣存儲用於固定寬度的數據源文件中的字段定義。然後我通過執行生成的DataTable：

DataTable dt = engine.ReadFileAsDT(file);

其中file是完整路徑到固定寬度的文件和engine是我保持從上面示出的方法GetFixedWidthFileClass()結果。好吧，現在我有一個沒有主鍵和派生列的DataTable。此外，dt中的所有字段都標記爲ReadOnly = true。這是事情變得混亂的地方。

我需要填充dt到finalDT，它需要沒有任何主鍵信息dt沒問題。如果發生這種情況，我可以使用finalDT將我的數據上傳到我的SQL表中。如果這不可能發生，那麼我需要一個finalDT沒有主鍵，但仍然上傳到我的SQL表的方式。 SqlBulkCopy會允許嗎？有另一種方法嗎？

在這一點上，我願意從頭開始，只要我可以使用FileHelper來解析固定寬度的文件，結果存儲到我的SQL表中，我只是沒有看到那裏的路徑。

來源

2016-08-18 breusshe

我想通了。這並不漂亮，但這是它的工作原理。基本上，我如何在我的原始帖子中設置我的代碼仍然適用，因爲我在GetFixedWidthFileClass()方法中沒有改變任何內容。然後我不得不添加了兩種方法來獲得finalDT設置正確：

/// <summary> 
///  For a given a datasource file, add all rows to the DataSet and collect Hexdump data 
/// </summary> 
/// <param name="ds"> 
///  The <see cref="System.Data.DataSet" /> to add to 
/// </param> 
/// <param name="file"> 
///  The datasource file to process 
/// </param> 
internal static void GenerateDatasource(ref DataSet ds, ref FileHelperEngine engine, DataSourceColumnSpecs mktgidSpecs, string file) 
{ 
    // Some singleton class instances to hold program data I will need. 
    singletonSQL sSQL = singletonSQL.sSQL; 
    singletonArguments sArgs = singletonArguments.sArgs; 

    try 
    { 
     // Load a DataTable with contents of datasource file. 
     DataTable dt = engine.ReadFileAsDT(file); 

     // Clean up the DataTable by removing columns that should be ignored. 
     DataTableCleanUp(ref dt, ref engine); 

     // ReadFileAsDT() makes all of the columns ReadOnly. Fix that. 
     foreach (DataColumn column in dt.Columns) 
      column.ReadOnly = false; 

     // Okay, now get a Primary Key and add in the derived columns. 
     GenerateDatasourceSchema(ref dt); 

     // Parse all of the rows and columns to do data clean up and assign some custom 
     // values. Add custom values for jobID and serial columns to each row in the DataTable. 
     for (int row = 0; row < dt.Rows.Count; row++) 
     { 
      string version = string.Empty; // The file version 
      bool found = false; // Used to get out of foreach loops once the required condition is found. 

      // Iterate all configured jobs and add the jobID and serial number to each row 
      // based upon match. 
      foreach (JobSetupDetails job in sSQL.VznJobDescriptions.JobDetails) 
      { 
       // Version must match id in order to update the row. Break out once we find 
       // the match to save time. 
       version = dt.Rows[row][dt.Columns[mktgidSpecs.header]].ToString().Trim().Split(new char[] { '_' })[0]; 
       foreach (string id in job.ids) 
       { 
        if (version.Equals(id)) 
        { 
         dt.Rows[row][dt.Columns["jobid"]] = job.jobID; 

         lock (locklist) 
          dt.Rows[row][dt.Columns["serial"]] = job.serial++; 

         found = true; 
         break; 
        } 
       } 
       if (found) 
        break; 
      } 

      // Parse all columns to do data clean up. 
      for (int column = 0; column < dt.Columns.Count; column++) 
      { 
       // This tab character keeps showing up in the data. It should not be there, 
       // but customer won't fix it, so we have to. 
       if (dt.Rows[row][column].GetType() == typeof(string)) 
        dt.Rows[row][column] = dt.Rows[row][column].ToString().Replace('\t', ' '); 
      } 
     } 

     dt.AcceptChanges(); 

     // DataTable is cleaned up and modified. Time to push it into the DataSet. 
     lock (locklist) 
     { 
      // If dt is writing back to the DataSet for the first time, Rows.Count will be 
      // zero. Since the DataTable in the DataSet does not have the table schema and 
      // since dt.Copy() is not an option (ds is referenced, so Copy() won't work), Use 
      // Merge() and use the option MissingSchemaAction.Add to create the schema. 
      if (ds.Tables[sSQL.FixedDataDefinition.DataTableName].Rows.Count == 0) 
       ds.Tables[sSQL.FixedDataDefinition.DataTableName].Merge(dt, false, MissingSchemaAction.Add); 
      else 
      { 
       // If this is not the first write to the DataSet, remove the PrimaryKey 
       // column to avoid duplicate key values. Use ImportRow() rather then .Merge() 
       // since, for whatever reason, Merge() is overwriting ds each time it is 
       // called and ImportRow() is actually appending the row. Ugly, but can't 
       // figure out another way to make this work. 
       dt.PrimaryKey = null; 
       dt.Columns.Remove(dt.Columns[0]); 
       foreach (DataRow dr in dt.Rows) 
        ds.Tables[sSQL.FixedDataDefinition.DataTableName].ImportRow(dr); 
      } 

      // Accept all the changes made to the DataSet. 
      ds.Tables[sSQL.FixedDataDefinition.DataTableName].AcceptChanges(); 
     } 

     // Clean up memory. 
     dt.Clear(); 

     // Log my progress. 
     log.GenerateLog("0038", log.Info 
         , engine.TotalRecords.ToString() + " DataRows successfully added for file:\r\n\t" 
         + file + "\r\nto DataTable " 
         + sSQL.FixedDataDefinition.DataTableName); 
    } 
    catch (Exception e) 
    { 
     // Something bad happened here. 
     log.GenerateLog("0038", log.Error, "Failed to add DataRows to DataTable " 
         + sSQL.FixedDataDefinition.DataTableName 
         + " for file\r\n\t" 
         + file, e); 
    } 
    finally 
    { 
     // Successful or not, get rid of the datasource file to prevent other issues. 
     File.Delete(file); 
    } 
}

而且這種方法：

/// <summary> 
///  Deletes columns that are not needed from a given <see cref="System.Data.DataTable" /> reference. 
/// </summary> 
/// <param name="dt"> 
///  The <see cref="System.Data.DataTable" /> to delete columns from. 
/// </param> 
/// <param name="engine"> 
///  The <see cref="FileHelperEngine" /> object containing data field usability information. 
/// </param> 
private static void DataTableCleanUp(ref DataTable dt, ref FileHelperEngine engine) 
{ 
    // Tracks DataColumns I need to remove from my temp DataTable, dt. 
    List<DataColumn> removeColumns = new List<DataColumn>(); 

    // If a field is Discarded, then the data was not imported because we don't need this 
    // column. In that case, mark the column for deletion by adding it to removeColumns. 
    for (int i = 0; i < engine.Options.Fields.Count; i++) 
     if (engine.Options.Fields[i].Discarded) 
      removeColumns.Add(dt.Columns[i]); 

    // Reverse the List so changes to dt don't generate schema errors. 
    removeColumns.Reverse(); 

    // Do the deletion. 
    foreach (DataColumn column in removeColumns) 
     dt.Columns.Remove(column); 

    // Clean up memory. 
    removeColumns.Clear(); 
}

基本上，因爲ds在GenerateDatasource方法引用（其中finalDT生活中的數據集），我無法使用dt.Copy()將數據推送到它。我不得不使用Merge()來做到這一點。然後，在我希望使用Merge()的地方，我不得不使用foreach循環和ImportRow()，因爲Merge()覆蓋了finalDT。

我不得不解決的其他問題有：

當我使用ImportRow()，那麼我還需要從dt刪除PrimaryKey否則我得到關於重複鍵錯誤。
FileHelperEngine或FileHelpers.Dynamic.FixedLengthClassBuilder在跳過我想忽略的列時出現問題。它或者根本不會承認它們，從而導致我的列偏移量以及數據源文件中數據讀取方式的準確性（使用FieldHidden選項），或者讀取它們並以任何方式創建列，但是不會「 t加載數據（使用FieldValueDiscarded和Visibility.Private或.Protected選項）。這對我意味着什麼，我必須在呼叫engine.ReadFileAsDT(file)後重復dt，並刪除標記爲Discarded的列。
由於FileHelper對我的PrimaryKey列或在處理過程中添加到所有數據源的其他派生列沒有任何瞭解，因此我必須將dt傳遞給方法（GenerateDatasourceSchema()）才能對此進行排序。該方法基本上只是添加這些列，並確保PrimaryKey是第一列。

其餘的代碼是修復我需要做的列和行。在某些情況下，我爲每行設置一個列的值，而在另一些情況下，我正在清除原始數據中的錯誤（正如它從我的客戶那裏得到的）。

這不太好，我希望找出一條更好的路。如果有人對我是如何做到的，我很樂意聽到它。

來源

2016-08-19 16:20:57 breusshe

使用FileHelpers.Dynamic，讀取一個固定寬度的文件並上傳到SQL

回答

相關問題