2009-04-23 63 views
32
文件

是否有可能BULK INSERT(SQL Server)的一個CSV文件中的字段只偶爾用引號括起來?具體而言,引號僅包含那些包含「,」的字段。SQL Server的批量插入不一致的行情

換句話說,我有數據,看起來像這樣(第一行包含標題):

id, company, rep, employees 
729216,INGRAM MICRO INC.,"Stuart, Becky",523 
729235,"GREAT PLAINS ENERGY, INC.","Nelson, Beena",114 
721177,GEORGE WESTON BAKERIES INC,"Hogan, Meg",253 

因爲報價並不一致,我不能用「‘’」作爲分隔符,我不知道如何創建一個格式文件來解決這個問題。

我嘗試使用','作爲分隔符並將其加載到臨時表中,其中每列都是varchar,然後使用一些kludgy處理去除引號,但這也不起作用,因爲字段包含','分成多列。

不幸的是,我沒有能力事先操縱CSV文件。

這是無望的嗎?

非常感謝您的任何建議。

順便說一句,我看到這個帖子SQL bulk import from csv,但在這種情況下,每一個領域是一貫用引號引用。那麼,在這種情況下,他可以使用','作爲分隔符,然後刪除引號。

回答

17

您將需要預處理文件,句點。

如果你真的需要這樣做,這裏是代碼。我寫這篇文章是因爲我絕對沒有選擇。這是實用程序代碼,我不爲此感到自豪,但它工作。該方法不是讓SQL瞭解帶引號的字段,而是操作文件以使用完全不同的分隔符。

編輯:這是github回購代碼。它得到了改進,現在帶有單元測試! https://github.com/chrisclark/Redelim-it

該函數接受一個輸入文件,並用新的分隔符替換所有的字段分隔逗號(而不是引用文本字段中的逗號,只是實際分隔的字符)。然後你可以告訴sql server使用新的字段分隔符而不是逗號。在這裏的函數版本中,佔位符是< TMP>(我相信這不會出現在原始的csv中 - 如果有的話,支撐爆炸)。

因此,通過做這樣的事情在運行此功能,您導入SQL後:

BULK INSERT MyTable 
FROM 'C:\FileCreatedFromThisFunction.csv' 
WITH 
(
FIELDTERMINATOR = '<*TMP*>', 
ROWTERMINATOR = '\n' 
) 

而且事不宜遲,可怕的,可怕的功能,我提前造成對你(編輯道歉 - 我發佈做這個而不只是功能on my blog here)的工作程序:

Private Function CsvToOtherDelimiter(ByVal InputFile As String, ByVal OutputFile As String) As Integer 

     Dim PH1 As String = "<*TMP*>" 

     Dim objReader As StreamReader = Nothing 
     Dim count As Integer = 0 'This will also serve as a primary key' 
     Dim sb As New System.Text.StringBuilder 

     Try 
      objReader = New StreamReader(File.OpenRead(InputFile), System.Text.Encoding.Default) 
     Catch ex As Exception 
      UpdateStatus(ex.Message) 
     End Try 

     If objReader Is Nothing Then 
      UpdateStatus("Invalid file: " & InputFile) 
      count = -1 
      Exit Function 
     End If 

     'grab the first line 
    Dim line = reader.ReadLine() 
    'and advance to the next line b/c the first line is column headings 
    If hasHeaders Then 
     line = Trim(reader.ReadLine) 
    End If 

    While Not String.IsNullOrEmpty(line) 'loop through each line 

     count += 1 

     'Replace commas with our custom-made delimiter 
     line = line.Replace(",", ph1) 

     'Find a quoted part of the line, which could legitimately contain commas. 
     'In that case we will need to identify the quoted section and swap commas back in for our custom placeholder. 
     Dim starti = line.IndexOf(ph1 & """", 0) 
     If line.IndexOf("""",0) = 0 then starti=0 

     While starti > -1 'loop through quoted fields 

      Dim FieldTerminatorFound As Boolean = False 

      'Find end quote token (originally a ",) 
      Dim endi As Integer = line.IndexOf("""" & ph1, starti) 

      If endi < 0 Then 
       FieldTerminatorFound = True 
       If endi < 0 Then endi = line.Length - 1 
      End If 

      While Not FieldTerminatorFound 

       'Find any more quotes that are part of that sequence, if any 
       Dim backChar As String = """" 'thats one quote 
       Dim quoteCount = 0 
       While backChar = """" 
        quoteCount += 1 
        backChar = line.Chars(endi - quoteCount) 
       End While 

       If quoteCount Mod 2 = 1 Then 'odd number of quotes. real field terminator 
        FieldTerminatorFound = True 
       Else 'keep looking 
        endi = line.IndexOf("""" & ph1, endi + 1) 
       End If 
      End While 

      'Grab the quoted field from the line, now that we have the start and ending indices 
      Dim source = line.Substring(starti + ph1.Length, endi - starti - ph1.Length + 1) 

      'And swap the commas back in 
      line = line.Replace(source, source.Replace(ph1, ",")) 

      'Find the next quoted field 
      '    If endi >= line.Length - 1 Then endi = line.Length 'During the swap, the length of line shrinks so an endi value at the end of the line will fail 
      starti = line.IndexOf(ph1 & """", starti + ph1.Length) 

     End While 

      line = objReader.ReadLine 

     End While 

     objReader.Close() 

     SaveTextToFile(sb.ToString, OutputFile) 

     Return count 

    End Function 
1

你應該能夠不僅指定字段分隔符(應該是[]],還應該指定文本限定符,在本例中它將是[「]。使用[]將其括起來,所以不會混淆」 。

+0

Kibbee - 謝謝你的回答。但是我不能使用任何數據庫工具 - 只是T-SQL,因爲這是自動化的。實質上,此應用程序具有允許用戶上傳CSV文件的功能,然後該應用程序將其加載到數據庫表中。而且我不知道如何使用「BULK INSERT」命令設置文本限定符。你可以擴展嗎? – mattstuehler 2009-04-23 16:26:43

19

這是不可能做批量插入此文件,從MSDN:

要可用作批量導入數據文件,CSV文件必須符合以下限制:

  • 數據字段不包含字段終止符。
  • 數據字段中沒有或全部值都用引號(「」)括起來。

http://msdn.microsoft.com/en-us/library/ms188609.aspx

一些簡單的文字處理應該是所有的需要來獲取文件準備進口。或者用戶可以要求根據自身的準則,以這兩種格式的文件,或使用逗號以外的東西作爲分隔符(例如|)

+1

宏 - 謝謝你。看起來很明確。我想過預處理文件 - 例如,將所有逗號改爲管道,但我不知道如何區分將值從逗號分隔字段的逗號。有沒有簡單的方法來做到這一點? – mattstuehler 2009-04-23 17:38:16

+0

正則表達式可能會有所幫助,但我不確定它們將如何處理引號內的多個逗號以及多行引用的字符串等條件。在算法上,你可以用一個管道來解析每個字符串,用逗號代替每個逗號,直到你達到「在這個點上替換被關閉,直到你達到一個結束報價,儘管這可能不是最有效的!」 – Macros 2009-04-23 18:08:37

+1

我有同樣的想法 - 逐行解析,逐場,甚至一字一字。用一點點的手肘潤滑油就行,但我懷疑它會非常有效。我希望能有一個簡單的答案。應該有 - 似乎必須經常出現,因爲當您嘗試將電子表格保存爲CSV文件時,Excel看起來像這樣格式化數據。好吧。 – mattstuehler 2009-04-23 18:51:07

2

這可能是更復雜或比你願意使用涉及,但。 ..

如果可以實現對行解析成在VB或者C#字段中的邏輯,你可以使用CLR表值函數(TVF)做到這一點。

甲CLR TVF可以讀取來自外部源的數據的性能良好的方式,當你希望有一些C#或VB碼的數據分離成列和/或調整這些值。

你必須願意到CLR程序集添加到您的數據庫(和一個允許外部或不安全的操作,因此它可以打開文件)。這可能會變得有點複雜或涉及到,但爲了獲得靈活性可能值得。

我有一些需要定期加載到表格的大文件,但某些代碼翻譯需要在某些列上執行,並且需要進行特殊處理才能加載會導致數據類型錯誤的值簡單的批量插入。

簡而言之,CLR TVF允許您對文件的每一行運行C#或VB代碼,並使用類似性能的批量插入(儘管您可能需要擔心日誌記錄)。 SQL Server文檔中的示例使您可以創建一個TVF以從事件日誌中讀取,以便將其用作起點。

請注意,CLR TVF中的代碼只能在處理第一行之前的init階段訪問數據庫(例如,不會查找每行 - 您在此之上使用普通的TVF來執行此類操作) 。根據你的問題,你似乎不需要這個。

另請注意,每個CLR TVF都必須明確指定其輸出列,因此您無法編寫一個可用於您可能具有的每個不同csv文件的通用可複用列。

你可以編寫一個CLR TVF從文件中讀取整行,返回一列結果集,然後使用普通的TVF從每種類型的文件讀取。這需要代碼解析每行寫入T-SQL,但是避免必須編寫許多CLR TVF。

+0

Brett,對不起 - 我在度假,剛纔看到了這個迴應。 CLR-TVF不是我熟悉的東西,但我肯定會研究它。非常感謝這個非常有趣的建議! – mattstuehler 2009-10-08 21:50:51

1

克里斯, 多謝這個!你救了我的餅乾!我真不敢相信,當XL做得這麼好的時候,批量裝載機不會處理這種情況..這些人不會在大廳裏看到彼此? 無論如何...我需要一個ConsoleApplication版本,所以這裏是我一起入侵的。它是骯髒的,但它像一個冠軍!我對分隔符進行了硬編碼,並將標題註釋掉,因爲它們不是我的應用程序需要的。

我希望我也可以在這裏貼一個漂亮的大啤酒給你。

Geeze,我不知道爲什麼結束模塊和公共類在代碼塊之外...... srry!

Module Module1 

    Sub Main() 

     Dim arrArgs() As String = Command.Split(",") 
     Dim i As Integer 
     Dim obj As New ReDelimIt() 

     Console.Write(vbNewLine & vbNewLine) 

     If arrArgs(0) <> Nothing Then 
      For i = LBound(arrArgs) To UBound(arrArgs) 
       Console.Write("Parameter " & i & " is " & arrArgs(i) & vbNewLine) 
      Next 


      obj.ProcessFile(arrArgs(0), arrArgs(1)) 

     Else 
      Console.Write("Usage Test1 <inputfile>,<outputfile>") 
     End If 

     Console.Write(vbNewLine & vbNewLine) 
    End Sub 

End Module 

Public Class ReDelimIt 

    Public Function ProcessFile(ByVal InputFile As String, ByVal OutputFile As String) As Integer 

     Dim ph1 As String = "|" 

     Dim objReader As System.IO.StreamReader = Nothing 
     Dim count As Integer = 0 'This will also serve as a primary key 
     Dim sb As New System.Text.StringBuilder 

     Try 
      objReader = New System.IO.StreamReader(System.IO.File.OpenRead(InputFile), System.Text.Encoding.Default) 
     Catch ex As Exception 
      MsgBox(ex.Message) 
     End Try 

     If objReader Is Nothing Then 
      MsgBox("Invalid file: " & InputFile) 
      count = -1 
      Exit Function 
     End If 

     'grab the first line 
     Dim line = objReader.ReadLine() 
     'and advance to the next line b/c the first line is column headings 
     'Removed Check Headers can put in if needed. 
     'If chkHeaders.Checked Then 
     'line = objReader.ReadLine 
     'End If 

     While Not String.IsNullOrEmpty(line) 'loop through each line 

      count += 1 

      'Replace commas with our custom-made delimiter 
      line = line.Replace(",", ph1) 

      'Find a quoted part of the line, which could legitimately contain commas. 
      'In that case we will need to identify the quoted section and swap commas back in for our custom placeholder. 
      Dim starti = line.IndexOf(ph1 & """", 0) 

      While starti > -1 'loop through quoted fields 

       'Find end quote token (originally a ",) 
       Dim endi = line.IndexOf("""" & ph1, starti) 

       'The end quote token could be a false positive because there could occur a ", sequence. 
       'It would be double-quoted ("",) so check for that here 
       Dim check1 = line.IndexOf("""""" & ph1, starti) 

       'A """, sequence can occur if a quoted field ends in a quote. 
       'In this case, the above check matches, but we actually SHOULD process this as an end quote token 
       Dim check2 = line.IndexOf("""""""" & ph1, starti) 

       'If we are in the check1 ("",) situation, keep searching for an end quote token 
       'The +1 and +2 accounts for the extra length of the checked sequences 
       While (endi = check1 + 1 AndAlso endi <> check2 + 2) 'loop through "false" tokens in the quoted fields 
        endi = line.IndexOf("""" & ph1, endi + 1) 
        check1 = line.IndexOf("""""" & ph1, check1 + 1) 
        check2 = line.IndexOf("""""""" & ph1, check2 + 1) 
       End While 

       'We have searched for an end token (",) but can't find one, so that means the line ends in a " 
       If endi < 0 Then endi = line.Length - 1 

       'Grab the quoted field from the line, now that we have the start and ending indices 
       Dim source = line.Substring(starti + ph1.Length, endi - starti - ph1.Length + 1) 

       'And swap the commas back in 
       line = line.Replace(source, source.Replace(ph1, ",")) 

       'Find the next quoted field 
       If endi >= line.Length - 1 Then endi = line.Length 'During the swap, the length of line shrinks so an endi value at the end of the line will fail 
       starti = line.IndexOf(ph1 & """", starti + ph1.Length) 

      End While 

      'Add our primary key to the line 
      ' Removed for now 
      'If chkAddKey.Checked Then 
      'line = String.Concat(count.ToString, ph1, line) 
      ' End If 

      sb.AppendLine(line) 

      line = objReader.ReadLine 

     End While 

     objReader.Close() 

     SaveTextToFile(sb.ToString, OutputFile) 

     Return count 

    End Function 

    Public Function SaveTextToFile(ByVal strData As String, ByVal FullPath As String) As Boolean 
     Dim bAns As Boolean = False 
     Dim objReader As System.IO.StreamWriter 
     Try 
      objReader = New System.IO.StreamWriter(FullPath, False, System.Text.Encoding.Default) 
      objReader.Write(strData) 
      objReader.Close() 
      bAns = True 
     Catch Ex As Exception 
      Throw Ex 
     End Try 
     Return bAns 
    End Function 

End Class 
5

我也創建了一個函數來將CSV轉換爲批量插入的可用格式。我使用克里斯克拉克的答覆文章作爲創建以下C#函數的出發點:

我結束了使用正則表達式來查找字段。然後,我一行一行地重新創建文件,並在寫入時將其寫入新文件,從而避免將整個文件加載到內存中。

private void CsvToOtherDelimiter(string CSVFile, System.Data.Linq.Mapping.MetaTable tbl) 
{ 
    char PH1 = '|'; 
    StringBuilder ln; 

    //Confirm file exists. Else, throw exception 
    if (File.Exists(CSVFile)) 
    { 
     using (TextReader tr = new StreamReader(CSVFile)) 
     { 
      //Use a temp file to store our conversion 
      using (TextWriter tw = new StreamWriter(CSVFile + ".tmp")) 
      { 
       string line = tr.ReadLine(); 
       //If we have already converted, no need to reconvert. 
       //NOTE: We make the assumption here that the input header file 
       //  doesn't have a PH1 value unless it's already been converted. 
       if (line.IndexOf(PH1) >= 0) 
       { 
        tw.Close(); 
        tr.Close(); 
        File.Delete(CSVFile + ".tmp"); 
        return; 
       } 
       //Loop through input file 
       while (!string.IsNullOrEmpty(line)) 
       { 
        ln = new StringBuilder(); 

        //1. Use Regex expression to find comma separated values 
        //using quotes as optional text qualifiers 
        //(what MS EXCEL does when you import a csv file) 
        //2. Remove text qualifier quotes from data 
        //3. Replace any values of PH1 found in column data 
        //with an equivalent character 
        //Regex: \A[^,]*(?=,)|(?:[^",]*"[^"]*"[^",]*)+|[^",]*"[^"]*\Z|(?<=,)[^,]*(?=,)|(?<=,)[^,]*\Z|\A[^,]*\Z 
        List<string> fieldList = Regex.Matches(line, @"\A[^,]*(?=,)|(?:[^"",]*""[^""]*""[^"",]*)+|[^"",]*""[^""]*\Z|(?<=,)[^,]*(?=,)|(?<=,)[^,]*\Z|\A[^,]*\Z") 
          .Cast<Match>() 
          .Select(m => RemoveCSVQuotes(m.Value).Replace(PH1, '¦')) 
          .ToList<string>(); 

        //Add the list of fields to ln, separated by PH1 
        fieldList.ToList().ForEach(m => ln.Append(m + PH1)); 

        //Write to file. Don't include trailing PH1 value. 
        tw.WriteLine(ln.ToString().Substring(0, ln.ToString().LastIndexOf(PH1))); 

        line = tr.ReadLine(); 
       } 


       tw.Close(); 
      } 
      tr.Close(); 

      //Optional: replace input file with output file 
      File.Delete(CSVFile); 
      File.Move(CSVFile + ".tmp", CSVFile); 
     } 
    } 
    else 
    { 
     throw new ArgumentException(string.Format("Source file {0} not found", CSVFile)); 
    } 
} 
//The output file no longer needs quotes as a text qualifier, so remove them 
private string RemoveCSVQuotes(string value) 
{ 
    //if is empty string, then remove double quotes 
    if (value == @"""""") value = ""; 
    //remove any double quotes, then any quotes on ends 
    value = value.Replace(@"""""", @""""); 
    if (value.Length >= 2) 
     if (value.Substring(0, 1) == @"""") 
      value = value.Substring(1, value.Length - 2); 
    return value; 
} 
+1

偉大的工程,但替換文件,並沒有考慮重音字符,所以一定要包括編碼在streamreader。除此之外。謝謝!=] – Oak 2013-08-06 20:00:55

7

我發現克里斯非常有幫助的答案,但我想用T-SQL(而不是使用CLR)從SQL Server中運行它,所以我轉換他的代碼,以T-SQL代碼。但後來我把它更進一步,在做了以下存儲過程包裹了一切:

  1. 使用批量插入到最初導入CSV文件
  2. 使用Chris的代碼
  3. 回報清理線以表格格式輸出結果

爲了我的需要,我進一步清理了行,通過除去引號括起來的值並將兩個雙引號轉換爲一個雙引號(我認爲這是正確的方法)。

CREATE PROCEDURE SSP_CSVToTable 

-- Add the parameters for the stored procedure here 
@InputFile nvarchar(4000) 
, @FirstLine int 

AS 

BEGIN 

-- SET NOCOUNT ON added to prevent extra result sets from 
-- interfering with SELECT statements. 
SET NOCOUNT ON; 

--convert the CSV file to a table 
--clean up the lines so that commas are handles correctly 

DECLARE @sql nvarchar(4000) 
DECLARE @PH1 nvarchar(50) 
DECLARE @LINECOUNT int -- This will also serve as a primary key 
DECLARE @CURLINE int 
DECLARE @Line nvarchar(4000) 
DECLARE @starti int 
DECLARE @endi int 
DECLARE @FieldTerminatorFound bit 
DECLARE @backChar nvarchar(4000) 
DECLARE @quoteCount int 
DECLARE @source nvarchar(4000) 
DECLARE @COLCOUNT int 
DECLARE @CURCOL int 
DECLARE @ColVal nvarchar(4000) 

-- new delimiter 
SET @PH1 = '†' 

-- create single column table to hold each line of file 
CREATE TABLE [#CSVLine]([line] nvarchar(4000)) 

-- bulk insert into temp table 
-- cannot use variable path with bulk insert 
-- so we must run using dynamic sql 
SET @Sql = 'BULK INSERT #CSVLine 
FROM ''' + @InputFile + ''' 
WITH 
(
FIRSTROW=' + CAST(@FirstLine as varchar) + ', 
FIELDTERMINATOR = ''\n'', 
ROWTERMINATOR = ''\n'' 
)' 

-- run dynamic statement to populate temp table 
EXEC(@sql) 

-- get number of lines in table 
SET @LINECOUNT = @@ROWCOUNT 

-- add identity column to table so that we can loop through it 
ALTER TABLE [#CSVLine] ADD [RowId] [int] IDENTITY(1,1) NOT NULL 

IF @LINECOUNT > 0 
BEGIN 
    -- cycle through each line, cleaning each line 
    SET @CURLINE = 1 
    WHILE @CURLINE <= @LINECOUNT 
    BEGIN 
     -- get current line 
     SELECT @line = line 
      FROM #CSVLine 
     WHERE [RowId] = @CURLINE 

     -- Replace commas with our custom-made delimiter 
     SET @Line = REPLACE(@Line, ',', @PH1) 

     -- Find a quoted part of the line, which could legitimately contain commas. 
     -- In that case we will need to identify the quoted section and swap commas back in for our custom placeholder. 
     SET @starti = CHARINDEX(@PH1 + '"' ,@Line, 0) 
     If CHARINDEX('"', @Line, 0) = 0 SET @starti = 0 

     -- loop through quoted fields 
     WHILE @starti > 0 
     BEGIN 
      SET @FieldTerminatorFound = 0 

      -- Find end quote token (originally a ",) 
      SET @endi = CHARINDEX('"' + @PH1, @Line, @starti) -- sLine.IndexOf("""" & PH1, starti) 

      IF @endi < 1 
      BEGIN 
       SET @FieldTerminatorFound = 1 
       If @endi < 1 SET @endi = LEN(@Line) - 1 
      END 

      WHILE @FieldTerminatorFound = 0 
      BEGIN 
       -- Find any more quotes that are part of that sequence, if any 
       SET @backChar = '"' -- thats one quote 
       SET @quoteCount = 0 

       WHILE @backChar = '"' 
       BEGIN 
        SET @quoteCount = @quoteCount + 1 
        SET @backChar = SUBSTRING(@Line, @[email protected], 1) -- sLine.Chars(endi - quoteCount) 
       END 

       IF (@quoteCount % 2) = 1 
       BEGIN 
        -- odd number of quotes. real field terminator 
        SET @FieldTerminatorFound = 1 
       END 
       ELSE 
       BEGIN 
        -- keep looking 
        SET @endi = CHARINDEX('"' + @PH1, @Line, @endi + 1) -- sLine.IndexOf("""" & PH1, endi + 1) 
       END 

      END 

      -- Grab the quoted field from the line, now that we have the start and ending indices 
      SET @source = SUBSTRING(@Line, @starti + LEN(@PH1), @endi - @starti - LEN(@PH1) + 1) 
      -- sLine.Substring(starti + PH1.Length, endi - starti - PH1.Length + 1) 

      -- And swap the commas back in 
      SET @Line = REPLACE(@Line, @source, REPLACE(@source, @PH1, ',')) 
      --sLine.Replace(source, source.Replace(PH1, ",")) 

      -- Find the next quoted field 
      -- If endi >= line.Length - 1 Then endi = line.Length 'During the swap, the length of line shrinks so an endi value at the end of the line will fail 
      SET @starti = CHARINDEX(@PH1 + '"', @Line, @starti + LEN(@PH1)) 
      --sLine.IndexOf(PH1 & """", starti + PH1.Length) 

     END 

     -- get table based on current line 
     IF OBJECT_ID('tempdb..#Line') IS NOT NULL 
      DROP TABLE #Line 

     -- converts a delimited list into a table 
     SELECT * 
     INTO #Line 
     FROM dbo.iter_charlist_to_table(@Line,@PH1) 

     -- get number of columns in line 
     SET @COLCOUNT = @@ROWCOUNT 

     -- dynamically create CSV temp table to hold CSV columns and lines 
     -- only need to create once 
     IF OBJECT_ID('tempdb..#CSV') IS NULL 
     BEGIN 
      -- create initial structure of CSV table 
      CREATE TABLE [#CSV]([Col1] nvarchar(100)) 

      -- dynamically add a column for each column found in the first line 
      SET @CURCOL = 1 
      WHILE @CURCOL <= @COLCOUNT 
      BEGIN 
       -- first column already exists, don't need to add 
       IF @CURCOL > 1 
       BEGIN 
        -- add field 
        SET @sql = 'ALTER TABLE [#CSV] ADD [Col' + Cast(@CURCOL as varchar) + '] nvarchar(100)' 

        --print @sql 

        -- this adds the fields to the temp table 
        EXEC(@sql) 
       END 

       -- go to next column 
       SET @CURCOL = @CURCOL + 1 
      END 
     END 

     -- build dynamic sql to insert current line into CSV table 
     SET @sql = 'INSERT INTO [#CSV] VALUES(' 

     -- loop through line table, dynamically adding each column value 
     SET @CURCOL = 1 
     WHILE @CURCOL <= @COLCOUNT 
     BEGIN 
      -- get current column 
      Select @ColVal = str 
       From #Line 
      Where listpos = @CURCOL 

      IF LEN(@ColVal) > 0 
      BEGIN 
       -- remove quotes from beginning if exist 
       IF LEFT(@ColVal,1) = '"' 
        SET @ColVal = RIGHT(@ColVal, LEN(@ColVal) - 1) 

       -- remove quotes from end if exist 
       IF RIGHT(@ColVal,1) = '"' 
        SET @ColVal = LEFT(@ColVal, LEN(@ColVal) - 1) 
      END 

      -- write column value 
      -- make value sql safe by replacing single quotes with two single quotes 
      -- also, replace two double quotes with a single double quote 
      SET @sql = @sql + '''' + REPLACE(REPLACE(@ColVal, '''',''''''), '""', '"') + '''' 

      -- add comma separater except for the last record 
      IF @CURCOL <> @COLCOUNT 
       SET @sql = @sql + ',' 

      -- go to next column 
      SET @CURCOL = @CURCOL + 1 
     END 

     -- close sql statement 
     SET @sql = @sql + ')' 

     --print @sql 

     -- run sql to add line to table 
     EXEC(@sql) 

     -- move to next line 
     SET @CURLINE = @CURLINE + 1 

    END 

END 

-- return CSV table 
SELECT * FROM [#CSV] 

END 

GO 

的存儲過程使用的是解析字符串轉換成表這個輔助功能(感謝厄蘭Sommarskog!):

CREATE FUNCTION [dbo].[iter_charlist_to_table] 
       (@list  ntext, 
       @delimiter nchar(1) = N',') 
    RETURNS @tbl TABLE (listpos int IDENTITY(1, 1) NOT NULL, 
         str  varchar(4000), 
         nstr nvarchar(2000)) AS 

BEGIN 
    DECLARE @pos  int, 
      @textpos int, 
      @chunklen smallint, 
      @tmpstr nvarchar(4000), 
      @leftover nvarchar(4000), 
      @tmpval nvarchar(4000) 

    SET @textpos = 1 
    SET @leftover = '' 
    WHILE @textpos <= datalength(@list)/2 
    BEGIN 
    SET @chunklen = 4000 - datalength(@leftover)/2 
    SET @tmpstr = @leftover + substring(@list, @textpos, @chunklen) 
    SET @textpos = @textpos + @chunklen 

    SET @pos = charindex(@delimiter, @tmpstr) 

    WHILE @pos > 0 
    BEGIN 
     SET @tmpval = ltrim(rtrim(left(@tmpstr, @pos - 1))) 
     INSERT @tbl (str, nstr) VALUES(@tmpval, @tmpval) 
     SET @tmpstr = substring(@tmpstr, @pos + 1, len(@tmpstr)) 
     SET @pos = charindex(@delimiter, @tmpstr) 
    END 

    SET @leftover = @tmpstr 
    END 

    INSERT @tbl(str, nstr) VALUES (ltrim(rtrim(@leftover)), ltrim(rtrim(@leftover))) 

RETURN 

END 

以下是我把它從T-SQL。在這種情況下,我將結果放到一個臨時表,所以我首先創建臨時表:

-- create temp table for file import 
CREATE TABLE #temp 
(
    CustomerCode nvarchar(100) NULL, 
    Name nvarchar(100) NULL, 
    [Address] nvarchar(100) NULL, 
    City nvarchar(100) NULL, 
    [State] nvarchar(100) NULL, 
    Zip nvarchar(100) NULL, 
    OrderNumber nvarchar(100) NULL, 
    TimeWindow nvarchar(100) NULL, 
    OrderType nvarchar(100) NULL, 
    Duration nvarchar(100) NULL, 
    [Weight] nvarchar(100) NULL, 
    Volume nvarchar(100) NULL 
) 

-- convert the CSV file into a table 
INSERT #temp 
EXEC [dbo].[SSP_CSVToTable] 
    @InputFile = @FileLocation 
    ,@FirstLine = @FirstImportRow 

我沒有測試性能多,但它很適合我的需要 - 導入CSV少於1000行的文件。但是,它可能會扼殺真正的大文件。

希望別人也認爲它有用。

乾杯!

2

另一種方法 - 假設你沒有加載字段或期望在數據本身中出現報價,那就是使用REPLACE函數。

UPDATE dbo.tablename 
     SET dbo.tablename.target_field = REPLACE(t.importedValue, '"', '') 
FROM #tempTable t 
WHERE dbo.tablename.target_id = t.importedID; 

我已經使用它。我無法對錶演提出任何要求。這是解決問題的一種快速而骯髒的方式。

-1

創建一個VB.NET程序,使用4轉換爲新的分隔符。5框架TextFieldParser 這將自動地處理文字合格字段

上面的代碼修改爲使用內置於TextFieldParser

模塊模塊1

Sub Main() 

    Dim arrArgs() As String = Command.Split(",") 
    Dim i As Integer 
    Dim obj As New ReDelimIt() 
    Dim InputFile As String = "" 
    Dim OutPutFile As String = "" 
    Dim NewDelimiter As String = "" 

    Console.Write(vbNewLine & vbNewLine) 

    If Not IsNothing(arrArgs(0)) Then 
     For i = LBound(arrArgs) To UBound(arrArgs) 
      Console.Write("Parameter " & i & " is " & arrArgs(i) & vbNewLine) 
     Next 
     InputFile = arrArgs(0) 
     If Not IsNothing(arrArgs(1)) Then 
      If Not String.IsNullOrEmpty(arrArgs(1)) Then 
       OutPutFile = arrArgs(1) 
      Else 
       OutPutFile = InputFile.Replace("csv", "pipe") 
      End If 
     Else 
      OutPutFile = InputFile.Replace("csv", "pipe") 
     End If 
     If Not IsNothing(arrArgs(2)) Then 
      If Not String.IsNullOrEmpty(arrArgs(2)) Then 
       NewDelimiter = arrArgs(2) 
      Else 
       NewDelimiter = "|" 
      End If 
     Else 
      NewDelimiter = "|" 
     End If 
     obj.ConvertCSVFile(InputFile,OutPutFile,NewDelimiter) 

    Else 
     Console.Write("Usage ChangeFileDelimiter <inputfile>,<outputfile>,<NewDelimiter>") 
    End If 
    obj = Nothing 
    Console.Write(vbNewLine & vbNewLine) 
    'Console.ReadLine() 

End Sub 

前端模塊

公共類ReDelimIt

Public Function ConvertCSVFile(ByVal InputFile As String, ByVal OutputFile As String, Optional ByVal NewDelimiter As String = "|") As Integer 
    Using MyReader As New Microsoft.VisualBasic.FileIO.TextFieldParser(InputFile) 
     MyReader.TextFieldType = FileIO.FieldType.Delimited 
     MyReader.SetDelimiters(",") 
     Dim sb As New System.Text.StringBuilder 
     Dim strLine As String = "" 
     Dim currentRow As String() 
     While Not MyReader.EndOfData 
      Try 
       currentRow = MyReader.ReadFields() 
       Dim currentField As String 
       strLine = "" 
       For Each currentField In currentRow 
        'MsgBox(currentField) 
        If strLine = "" Then 
         strLine = strLine & currentField 
        Else 
         strLine = strLine & NewDelimiter & currentField 
        End If 
       Next 
       sb.AppendLine(strLine) 
      Catch ex As Microsoft.VisualBasic.FileIO.MalformedLineException 
       'MsgBox("Line " & ex.Message & "is not valid and will be skipped.") 
       Console.WriteLine("Line " & ex.Message & "is not valid and will be skipped.") 
      End Try 
     End While 
     SaveTextToFile(sb.ToString, OutputFile) 
    End Using 

    Return Err.Number 

End Function 

Public Function SaveTextToFile(ByVal strData As String, ByVal FullPath As String) As Boolean 
    Dim bAns As Boolean = False 
    Dim objReader As System.IO.StreamWriter 
    Try 
     If FileIO.FileSystem.FileExists(FullPath) Then 
      Kill(FullPath) 
     End If 
     objReader = New System.IO.StreamWriter(FullPath, False, System.Text.Encoding.Default) 
     objReader.Write(strData) 
     objReader.Close() 
     bAns = True 
    Catch Ex As Exception 
     Throw Ex 
    End Try 
    Return bAns 
End Function 

結束等級

3

通常情況下,此問題是由用戶將Excel文件導出爲CSV導致的。

有解決這個問題的方法有兩種:

從Excel
  1. 出口使用宏,as per Microsoft's suggestion
  2. 還是非常簡單的方法:
    • 打開CSV在Excel中。
    • 另存爲Excel文件。 (.xls或.xlsx)。
    • 將該文件作爲an Excel file導入到SQL Server中。
    • 輕笑自己,因爲你沒有編寫像上面的解決方案,任何東西.... muhahahaha

Import as Excel file

下面是一些SQL如果你真的想腳本它(後保存CSV爲Excel):

select * 
into SQLServerTable FROM OPENROWSET('Microsoft.Jet.OLEDB.4.0', 
    'Excel 8.0;Database=D:\testing.xls;HDR=YES', 
    'SELECT * FROM [Sheet1$]') 
0

此代碼的工作:

public bool CSVFileRead(string fullPathWithFileName, string fileNameModified, string tableName) 
    { 
     SqlConnection con = new SqlConnection(ConfigurationSettings.AppSettings["dbConnectionString"]); 
     string filepath = fullPathWithFileName; 
     StreamReader sr = new StreamReader(filepath); 
     string line = sr.ReadLine(); 
     string[] value = line.Split(','); 
     DataTable dt = new DataTable(); 
     DataRow row; 
     foreach (string dc in value) 
     { 
      dt.Columns.Add(new DataColumn(dc)); 
     } 
     while (!sr.EndOfStream) 
     { 
      //string[] stud = sr.ReadLine().Split(','); 
      //for (int i = 0; i < stud.Length; i++) 
      //{ 
      // stud[i] = stud[i].Replace("\"", ""); 
      //} 
      //value = stud; 
      value = sr.ReadLine().Split(','); 
      if (value.Length == dt.Columns.Count) 
      { 
       row = dt.NewRow(); 
       row.ItemArray = value; 
       dt.Rows.Add(row); 
      } 
     } 
     SqlBulkCopy bc = new SqlBulkCopy(con.ConnectionString, SqlBulkCopyOptions.TableLock); 
     bc.DestinationTableName = tableName; 
     bc.BatchSize = dt.Rows.Count; 
     con.Open(); 
     bc.WriteToServer(dt); 
     bc.Close(); 
     con.Close(); 

     return true; 
    } 
0

我放在一起下面解決我的情況。我需要預處理非常大的文件並整理不一致的引用。只需將其粘貼到一個空白的C#應用​​程序,將consts設置爲您的要求,然後離開您。這適用於超過10 GB的非常大的CSV。

namespace CsvFixer 
{ 
    using System.IO; 
    using System.Text; 

    public class Program 
    { 
     private const string delimiter = ","; 
     private const string quote = "\""; 
     private const string inputFile = "C:\\temp\\input.csv"; 
     private const string fixedFile = "C:\\temp\\fixed.csv"; 

     /// <summary> 
     /// This application fixes inconsistently quoted csv (or delimited) files with support for very large file sizes. 
     /// For example : 1223,5235234,8674,"Houston","London, UK",3425,Other text,stuff 
     /// Must become : "1223","5235234","8674","Houston","London, UK","3425","Other text","stuff" 
     /// </summary> 
     /// <param name="args"></param> 
     static void Main(string[] args) 
     { 
      // Use streaming to allow for large files. 
      using (StreamWriter outfile = new StreamWriter(fixedFile)) 
      { 
       using (FileStream fs = File.Open(inputFile, FileMode.Open, FileAccess.Read, FileShare.ReadWrite)) 
       using (BufferedStream bs = new BufferedStream(fs)) 
       using (StreamReader sr = new StreamReader(bs)) 
       { 
        string currentLine; 

        // Read each input line in and write each fixed line out 
        while ((currentLine = sr.ReadLine()) != null) 
        { 
         outfile.WriteLine(FixLine(currentLine, delimiter, quote)); 
        } 
       } 
      } 
     } 

     /// <summary> 
     /// Fully quote a partially quoted line 
     /// </summary> 
     /// <param name="line">The partially quoted line</param> 
     /// <returns>The fully quoted line</returns> 
     private static string FixLine(string line, string delimiter, string quote) 
     { 
      StringBuilder fixedLine = new StringBuilder(); 

      // Split all on the delimiter, acceptinmg that some quoted fields 
      // that contain the delimiter wwill be split in to many pieces. 
      string[] fieldParts = line.Split(delimiter.ToCharArray()); 

      // Loop through the fields (or parts of fields) 
      for (int i = 0; i < fieldParts.Length; i++) 
      { 
       string currentFieldPart = fieldParts[i]; 

       // If the current field part starts and ends with a quote it is a field, so write it to the result 
       if (currentFieldPart.StartsWith(quote) && currentFieldPart.EndsWith(quote)) 
       { 
        fixedLine.Append(string.Format("{0}{1}", currentFieldPart, delimiter)); 
       } 
       // else if it starts with a quote but doesnt end with one, it is part of a lionger field. 
       else if (currentFieldPart.StartsWith(quote)) 
       { 
        // Add the start of the field 
        fixedLine.Append(string.Format("{0}{1}", currentFieldPart, delimiter)); 

        // Append any additional field parts (we will only hit the end of the field when 
        // the last field part finishes with a quote. 
        while (!fieldParts[++i].EndsWith(quote)) 
        { 
         fixedLine.Append(string.Format("{0}{1}", fieldParts[i], delimiter)); 
        } 

        // Append the last field part - i.e. the part containing the closing quote 
        fixedLine.Append(string.Format("{0}{1}", fieldParts[i], delimiter)); 
       } 
       else 
       { 
        // The field has no quotes, add the feildpart with quote as bookmarks 
        fixedLine.Append(string.Format("{0}{1}{0}{2}", quote, currentFieldPart, delimiter)); 
       } 
      } 

      // Return the fixed string 
      return fixedLine.ToString(); 
     } 
    } 
}