2012-03-28 54 views
3

我有一個源使用統計信息填充表格中的單個文本字段。 我需要將這些數據拖入另一個表 中的多個字段,但奇怪的格式導致自動導入困難。將文本解析爲多列

的文件格式是純文本,但在下面的例子是:

08:34:52 Checksum=180957248,TicketType=6,InitialUserType=G,InitialUserID=520,CommunicationType=Incoming,Date=26-03-2012,Time=08:35:00,Service=ST,Duration=00:00:14,Cost=0.12 

有效它是由以下部分組成:

[timestamp] [Field1 name]=[Field1 value],[Field2 name]=[Field2 value],[Field4 name]=[Field4 value]...[CR] 

所有字段總是以相同的順序,但並不總是存在 。 合計列可能是從5到30

我試過下面的函數來翻譯它,這似乎大部分工作,但似乎隨機跳過領域的任何地方:

解析數據:

(SELECT [Data].[dbo].[GetFromTextString] ('Checksum=' ,',' ,RAWTEXT)) AS RowCheckSum, 
(SELECT [Data].[dbo].[GetFromTextString] ('TicketType=' ,',' ,RAWTEXT)) AS TicketType, 

和函數:

​​

任何人都可以提出一個更好/更清潔的方式來剝離出的數據或會有人明白爲什麼這個公式滑雪ps行?

任何幫助真的很感激。

+0

這是否必須在SQL中完成?在另一個支持Arrays和Key-Value Pairs的層中執行此操作會更加簡潔。 – GarethD 2012-03-28 11:10:10

+0

您使用的是什麼品牌的SQL? – MatBailie 2012-03-28 11:13:17

+0

我可用的唯一工具是SQL Server 2008和Crystal Reports 2008,我可能通過VB.net應用程序訪問數據,但這必須在客戶端計算機上運行,​​我寧願將該過程自動化爲每晚運行(當服務器沒有負載時),因爲潛在的行將每晚5000+。 – bendataclear 2012-03-28 11:15:09

回答

3

注 - 第一個解決方案是垃圾。我有左邊的歷史原因,但一個更好的解決方案是包含在下面

我甚至不知道如果這將比你目前的方法,但它是我會解決這個問題的方式(如果我是強制進入只有SQL的解決方案)。所需要的第一件事情是一個表值函數,將執行分割功能:

CREATE FUNCTION dbo.Split (@TextToSplit VARCHAR(MAX), @Delimiter VARCHAR(MAX)) 
RETURNS @Values TABLE (Position INT IDENTITY(1, 1) NOT NULL, TextValues VARCHAR(MAX) NOT NULL) 
AS 
BEGIN 
    WHILE CHARINDEX(@Delimiter, @TextToSplit) > 0 
     BEGIN 
      INSERT @Values 
      SELECT LEFT(@TextToSplit, CHARINDEX(@Delimiter, @TextToSplit) - 1) 
      SET @TextToSplit = SUBSTRING(@TextToSplit, CHARINDEX(@Delimiter, @TextToSplit) + 1, LEN(@TextToSplit)) 

     END 
     INSERT @Values VALUES (@TextToSplit) 
    RETURN 
END 

對於我的例子中,我從一個臨時表@Worklist工作,您可能需要相應地調整你的,或者你可以只將相關數據插入@Worklist中,我已經使用了虛擬數據:

DECLARE @WorkList TABLE (ID INT IDENTITY(1, 1) NOT NULL, TextField VARCHAR(MAX)) 
INSERT @WorkList 
SELECT '08:34:52 Checksum=180957248,TicketType=6,InitialUserType=G,InitialUserID=520,CommunicationType=Incoming,Date=26-03-2012,Time=08:35:00,Service=ST,Duration=00:00:14,Cost=0.12' 
UNION 
SELECT '08:34:52 Checksum=180957249,TicketType=5,InitialUserType=H,InitialUserID=521,CommunicationType=Outgoing,Date=27-03-2012,Time=14:27:00,Service=ST,Duration=00:15:12,Cost=0.37' 

查詢的主要位在這裏完成。這段時間很長,所以我儘可能地對它進行評論。如果需要進一步澄清,我可以添加更多評論。

DECLARE @Output TABLE (ID INT IDENTITY(1, 1) NOT NULL, TextField VARCHAR(MAX)) 
DECLARE @KeyPairs TABLE (WorkListID INT NOT NULL, KeyField VARCHAR(MAX), ValueField VARCHAR(MAX)) 

-- STORE TIMESTAMP DATA - THIS ASSUMES THE FIRST SPACE IS THE END OF THE TIMESTAMP 
INSERT @KeyPairs 
SELECT ID, 'TimeStamp', LEFT(TextField, CHARINDEX(' ', TextField)) 
FROM @WorkList 

-- CLEAR THE TIMESTAMP FROM THE WORKLIST 
UPDATE @WorkList 
SET  TextField = SUBSTRING(TextField, CHARINDEX(' ', TextField) + 1, LEN(TextField)) 

DECLARE @ID INT = (SELECT MIN(ID) FROM @WorkList) 
WHILE @ID IS NOT NULL 
    BEGIN 
     -- SPLIT THE STRING FIRST INTO ALL THE PAIRS (e.g. Checksum=180957248) 
     INSERT @Output 
     SELECT TextValues 
     FROM dbo.Split((SELECT TextField FROM @WorkList WHERE ID = @ID), ',') 

     DECLARE @ID2 INT = (SELECT MIN(ID) FROM @Output) 

     -- FOR ALL THE PAIRS SPLIT THEM INTO A KEY AND A VALUE (USING THE POSITION OF THE SPLIT FUNCTION) 
     WHILE @ID2 IS NOT NULL 
      BEGIN 
       INSERT @KeyPairs 
       SELECT @ID, 
         MAX(CASE WHEN Position = 1 THEN TextValues ELSE '' END), 
         MAX(CASE WHEN Position = 2 THEN TextValues ELSE '' END) 
       FROM dbo.Split((SELECT TextField FROM @Output WHERE ID = @ID2), '=') 

       DELETE @Output 
       WHERE ID = @ID2 

       SET @ID2 = (SELECT MIN(ID) FROM @Output) 
      END 

     DELETE @WorkList 
     WHERE ID = @ID 

     SET @ID = (SELECT MIN(ID) FROM @WorkList) 
    END 

-- WE NOW HAVE A TABLE CONTAINING EAV MODEL STYLE DATA. THIS NEEDS TO BE PIVOTED INTO THE CORRECT FORMAT 
-- ENSURE COLUMNS ARE LISTED IN THE ORDER YOU WANT THEM TO APPEAR 
SELECT * 
FROM @KeyPairs p 
     PIVOT 
     ( MAX(ValueField) 
      FOR KeyField IN 
       ( [TimeStamp], [Checksum], [TicketType], [InitialUserType], 
        [InitialUserID], [CommunicationType], [Date], [Time], 
        [Service], [Duration], [Cost] 
       ) 
     ) AS PivotTable; 

EDIT(4年後)

最近給予好評把這個給我的注意和我恨自己永遠張貼在其目前的形式這個答案有點。

一個更好的分流作用將是:

CREATE FUNCTION dbo.Split 
(
    @List  NVARCHAR(MAX), 
    @Delimiter NVARCHAR(255) 
) 
RETURNS TABLE 
WITH SCHEMABINDING AS 
RETURN 
( WITH N1 AS (SELECT N FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1), (1)) n (N)), 
    N2(N) AS (SELECT 1 FROM N1 a CROSS JOIN N1 b), 
    N3(N) AS (SELECT 1 FROM N2 a CROSS JOIN N2 b), 
    N4(N) AS (SELECT 1 FROM N3 a CROSS JOIN N3 b), 
    cteTally(N) AS 
    ( SELECT 0 UNION ALL 
     SELECT TOP (DATALENGTH(ISNULL(@List,1))) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) 
     FROM n4 
    ), 
    cteStart(N1) AS 
    ( SELECT t.N+1 
     FROM cteTally t 
     WHERE (SUBSTRING(@List,t.N,1) = @Delimiter OR t.N = 0) 
    ) 
    SELECT Item = SUBSTRING(@List, s.N1, ISNULL(NULLIF(CHARINDEX(@Delimiter,@List,s.N1),0)-s.N1,8000)), 
      Position = s.N1, 
      ItemNumber = ROW_NUMBER() OVER(ORDER BY s.N1) 
    FROM cteStart s 
); 

那就沒有必要循環可言,你只需要有一組正確基礎的解決方案,通過調用分割函數兩次,讓您的EAV風格數據組:

DECLARE @WorkList TABLE (ID INT IDENTITY(1, 1) NOT NULL, TextField VARCHAR(MAX)) 
INSERT @WorkList 
SELECT '08:34:52 Checksum=180957248,TicketType=6,InitialUserType=G,InitialUserID=520,CommunicationType=Incoming,Date=26-03-2012,Time=08:35:00,Service=ST,Duration=00:00:14,Cost=0.12' 
UNION 
SELECT '08:34:52 Checksum=180957249,TicketType=5,InitialUserType=H,InitialUserID=521,CommunicationType=Outgoing,Date=27-03-2012,Time=14:27:00,Service=ST,Duration=00:15:12,Cost=0.37'; 

WITH KeyPairs AS 
( SELECT w.ID, 
      [Timestamp] = LEFT(w.TextField, CHARINDEX(' ', w.TextField)), 
      KeyField = MAX(CASE WHEN v.ItemNumber = 1 THEN v.Item END), 
      ValueField = MAX(CASE WHEN v.ItemNumber = 2 THEN v.Item END) 
    FROM @WorkList AS w 
      CROSS APPLY dbo.Split(SUBSTRING(TextField, CHARINDEX(' ', TextField) + 1, LEN(TextField)), ',') AS kp 
      CROSS APPLY dbo.Split(kp.Item, '=') AS v 
    GROUP BY w.ID, kp.ItemNumber,w.TextField 
) 
SELECT * 
FROM KeyPairs AS kp 
     PIVOT 
     ( MAX(ValueField) 
      FOR KeyField IN 
       ( [Checksum], [TicketType], [InitialUserType], 
        [InitialUserID], [CommunicationType], [Date], [Time], 
        [Service], [Duration], [Cost] 
       ) 
     ) AS pvt; 
+0

這看起來像是一個很好的解決方案,在某些數據上測試過,它使用select快速運行。如果我想用它來插入,它將如何處理空值/缺失值或者是否在所有行中都不存在密鑰?我需要設置我的永久性表格以包含所有30個字段,但它們都是無效的。 – bendataclear 2012-03-28 13:13:33

+1

如果某個鍵沒有值,它只會返回NULL。 – GarethD 2012-03-28 13:47:13

+0

但是,如果我正在做'INSERT INTO MyTable SELECT * FROM @Keypairs ...'它會插入正確的值到正確的列中,還是會爲缺失的列插入空值? – bendataclear 2012-03-28 14:40:55