2016-12-28 55 views
0

我試圖清理數據結構中設置類似下表:選擇最長的字符串中的每個領域

dataSource 

| ID_dec | ID_base | name | field1  | field2  | field3  | 
| 1.01 | 1  | AAA | Cat   | Brown  | Domesticated | 
| 1.02 | 1  | AAA | Cat   | Brown  | Domesticated | 
| 1.03 | 1  | AAA | Feline  | NULL   | Dom.   | 
| 1.04 | 1  | AAA | Beautiful cat | NULL   | NULL   | 
| 1.05 | 1  | AAA | NULL   | Light Brown | NULL   | 
| 2.01 | 2  | BBB | Dog   | Black  | Wild   | 
| 2.02 | 2  | BBB | Barker  | NULL   | NULL   | 
| 3.01 | 3  | CCC | Bird   | Yellow  | Domesticated | 
| 4.01 | 4  | DDD | Snake   | NULL   | NULL   | 
| 4.02 | 4  | DDD | NULL   | Green  | NULL   | 
| 4.03 | 4  | DDD | NULL   | Forest Green | NULL   | 
| 4.04 | 4  | DDD | NULL   | Green  | Wild   | 
| 4.05 | 4  | DDD | NULL   | NULL   | Wild   | 

我想拉的field[N]ID_base每個組合的最長的字符串,如所以:

result 

| ID_base | name | field1  | field2  | field3  | 
| 1  | AAA | Beautiful cat | Light Brown | Domesticated | 
| 2  | BBB | Barker  | Black  | Wild   | 
| 3  | CCC | Bird   | Yellow  | Domesticated | 
| 4  | DDD | Snake   | Forest Green | Wild   | 

This has been asked before,但僅限於檢查單個字段。以下SQL得到我所希望的結果,但是,當按比例放大到實際數據組37個字段和5665行的感覺低效(4029個ID_base s,並且最ID_dec s到單個ID_base爲10):

SELECT DISTINCT a.id_base, a.name, b.result, c.result, d.result 
FROM 
    dataSource a 
    LEFT JOIN 
     (
     SELECT y.id_base, max(y.field1) result 
     FROM dataSource y 
     LEFT JOIN 
      (
      SELECT id_base, max(len(field1)) leng 
      FROM dataSource 
      GROUP BY id_base 
      ) z 
      ON y.id_base = z.id_base 
     WHERE len(y.field1) = z.leng 
     GROUP BY y.id_base 
     ) b 
    ON a.id_base = b.id_base 
    LEFT JOIN 
     (
     SELECT y.id_base, max(y.field2) result 
     FROM dataSource y 
     LEFT JOIN 
      (
      SELECT id_base, max(len(field2)) leng 
      FROM dataSource 
      GROUP BY id_base 
      ) z 
      ON y.id_base = z.id_base 
     WHERE len(y.field1) = z.leng 
     GROUP BY y.id_base 
     ) c 
    ON a.id_base = c.id_base 
    LEFT JOIN 
     (
     SELECT y.id_base, max(y.field3) result 
     FROM dataSource y 
     LEFT JOIN 
      (
      SELECT id_base, max(len(field3)) leng 
      FROM dataSource 
      GROUP BY id_base 
      ) z 
      ON y.id_base = z.id_base 
     WHERE len(y.field1) = z.leng 
     GROUP BY y.id_base 
     ) d 
    ON a.id_base = d.id_base 

這個查詢最好的方法是什麼?

回答

1
WITH a AS (
    SELECT id_base, name, max(len(field1)) l1, max(len(field2)) l2, max(len(field3)) l3 
    FROM datasource 
    GROUP BY id_base, name 
) 
SELECT a.*, 
    (SELECT TOP 1 field1 FROM datasource WHERE id_base = a.id_base AND len(field1) = a.l1), 
    (SELECT TOP 1 field2 FROM datasource WHERE id_base = a.id_base AND len(field2) = a.l2), 
    (SELECT TOP 1 field3 FROM datasource WHERE id_base = a.id_base AND len(field3) = a.l3) 
from a 
0
Select coalesce(t1.ID_base, t2.ID_base, t3.ID_base) base, 
    coalesce(t1.Name, t2.Name, t3.Name) Name, 
    coalesce(t1.field1, t2.field1, t3.field1) field1, 
    coalesce(t1.field2, t2.field2, t3.field2) field2, 
    coalesce(t1.field3, t2.field3, t3.field3) field3 
from dataSource t1 
    full join dataSource t2 on t2.ID_base = t1.ID_base 
     and len(t1.field1) = (Select Max(len(field1)) from dataSource 
          where ID_base = t1.ID_base) 
     and len(t2.field2) = (Select Max(len(field2)) from dataSource 
          where ID_base = t2.ID_base) 
    full join dataSource t3 on t3.ID_base = t1.ID_base 
     and len(t3.field3) = (Select Max(len(field3)) from dataSource 
          where ID_base = t3.ID_base) 
1

另一種更簡單的變化:

SELECT 
     t.id_base, 
     t.name 
     (SELECT TOP 1 field1 FROM table WHERE id_base = t.id_base ORDER BY LEN(field1) DESC), 
     (SELECT TOP 1 field2 FROM table WHERE id_base = t.id_base ORDER BY LEN(field2) DESC), 
     (SELECT TOP 1 field3 FROM table WHERE id_base = t.id_base ORDER BY LEN(field3) DESC) 
FROM (SELECT DISTINCT id_base, name FROM table) t