我試圖清理數據結構中設置類似下表:選擇最長的字符串中的每個領域
dataSource
| ID_dec | ID_base | name | field1 | field2 | field3 |
| 1.01 | 1 | AAA | Cat | Brown | Domesticated |
| 1.02 | 1 | AAA | Cat | Brown | Domesticated |
| 1.03 | 1 | AAA | Feline | NULL | Dom. |
| 1.04 | 1 | AAA | Beautiful cat | NULL | NULL |
| 1.05 | 1 | AAA | NULL | Light Brown | NULL |
| 2.01 | 2 | BBB | Dog | Black | Wild |
| 2.02 | 2 | BBB | Barker | NULL | NULL |
| 3.01 | 3 | CCC | Bird | Yellow | Domesticated |
| 4.01 | 4 | DDD | Snake | NULL | NULL |
| 4.02 | 4 | DDD | NULL | Green | NULL |
| 4.03 | 4 | DDD | NULL | Forest Green | NULL |
| 4.04 | 4 | DDD | NULL | Green | Wild |
| 4.05 | 4 | DDD | NULL | NULL | Wild |
我想拉的field[N]
和ID_base
每個組合的最長的字符串,如所以:
result
| ID_base | name | field1 | field2 | field3 |
| 1 | AAA | Beautiful cat | Light Brown | Domesticated |
| 2 | BBB | Barker | Black | Wild |
| 3 | CCC | Bird | Yellow | Domesticated |
| 4 | DDD | Snake | Forest Green | Wild |
This has been asked before,但僅限於檢查單個字段。以下SQL得到我所希望的結果,但是,當按比例放大到實際數據組37個字段和5665行的感覺低效(4029個ID_base
s,並且最ID_dec
s到單個ID_base
爲10):
SELECT DISTINCT a.id_base, a.name, b.result, c.result, d.result
FROM
dataSource a
LEFT JOIN
(
SELECT y.id_base, max(y.field1) result
FROM dataSource y
LEFT JOIN
(
SELECT id_base, max(len(field1)) leng
FROM dataSource
GROUP BY id_base
) z
ON y.id_base = z.id_base
WHERE len(y.field1) = z.leng
GROUP BY y.id_base
) b
ON a.id_base = b.id_base
LEFT JOIN
(
SELECT y.id_base, max(y.field2) result
FROM dataSource y
LEFT JOIN
(
SELECT id_base, max(len(field2)) leng
FROM dataSource
GROUP BY id_base
) z
ON y.id_base = z.id_base
WHERE len(y.field1) = z.leng
GROUP BY y.id_base
) c
ON a.id_base = c.id_base
LEFT JOIN
(
SELECT y.id_base, max(y.field3) result
FROM dataSource y
LEFT JOIN
(
SELECT id_base, max(len(field3)) leng
FROM dataSource
GROUP BY id_base
) z
ON y.id_base = z.id_base
WHERE len(y.field1) = z.leng
GROUP BY y.id_base
) d
ON a.id_base = d.id_base
這個查詢最好的方法是什麼?