2013-03-22 48 views
0

我有一個領域(id,letter,date)和一些數據的表吧:填充的NULL-S的空白表中的平均數值

1 A 2012-01-01 
2 B NULL 
3 C NULL 
4 D 2012-01-15 

我想最近的非平均日期,以填補空值-NULL值。這樣的:

1 A 2012-01-01 
2 B 2012-01-08 
3 C 2012-01-08 
4 D 2012-01-15 

或者,也許,即使這樣的:

1 A 2012-01-01 
2 B 2012-01-08 
3 C 2012-01-11 
4 D 2012-01-15 

兩種變體是巨大的。有沒有簡單的方法在MySQL中實現它?

在此先感謝

UPD表是相當大的,約700.000記錄,而像描述的概念約50.000差距。

UPD2有點清潔器:表可以是這樣的:

1 A 2012-01-01 
2 B NULL 
3 C NULL 
4 D 2012-01-15 
5 E NULL 
6 F 2012-01-17 
7 G NULL 
8 H NULL 
9 I 2012-01-20 

預期的結果是這樣的:

1 A 2012-01-01 
2 B **2012-01-08** 
3 C **2012-01-08** 
4 D 2012-01-15 
5 E **2012-01-16** 
6 F 2012-01-17 
7 G **2012-01-18** 
8 H **2012-01-18** 
9 I 2012-01-20 

(星號是要注意變更值)。謝謝

UPD3感謝所有人。但我會用另一種方式來做,用一個簡單的公式計算日期:needed_date = [(max(date)-min(date))/(max(id)-min(id)] *(my_ID-min(id ))+最小值(日期)

+0

看看第一組數據,如果你有其他的記錄,像'5,E,NULL','6,F,2012-01-20',會是怎樣的結果呢? – 2013-03-22 18:05:12

+1

*你爲什麼要操縱數據?檢索記錄時應該進行此計算。 – Kermit 2013-03-22 18:05:32

+0

什麼是記錄的順序和字段的值之間的相關性(即會一直爲B在時間之前)? – 2013-03-22 18:07:50

回答

1

假設你有一個表稱爲T這樣的:

每個NULL記錄
CREATE TABLE T(
    id INT, 
    time DATETIME 
); 

下面的查詢會給你的界限:

SELECT T.Id 
    , MAX(T1.Time) as MinDate 
    , MIN(T2.Time) as MaxDate  
    FROM T 
INNER JOIN T T1 ON T1.Id < T.Id 
       AND T.time IS NULL 
       AND NOT T1.time IS NULL 
INNER JOIN T T2 ON T2.id > T.id 
       AND T.time IS NULL 
       AND NOT T2.time IS NULL 
GROUP BY Id 

輸出將成爲:

Id MinDate  MaxDate 
2 2012-01-01 2012-01-15 
3 2012-01-01 2012-01-15 

因此,下一步將做使用此結果集,平均爲實例,以更新的NULL值的更新..

UPDATE T 
INNER JOIN 
(
    SELECT T.Id, MAX(T1.Time) as MinTime, MIN(T2.Time) as MaxTime 
    FROM T 
    INNER JOIN T T1 ON T1.id < T.id 
       AND T.time IS NULL 
       AND NOT T1.time IS NULL 
    INNER JOIN T T2 ON T2.id > T.id 
       AND T.time IS NULL 
       AND NOT T2.time IS NULL  
    GROUP BY T.ID) T3 
ON T3.id = T.id 
SET T.time = FROM_UNIXTIME((UNIX_TIMESTAMP(T3.MinTime) + UNIX_TIMESTAMP(T3.MaxTime))/2) 
WHERE T.time IS NULL 

Working SQLFiddle Here

+0

大於它是一個解決方案。但它'EXPLAIN'約700.000記錄的表也不是那麼好:( – 2013-03-22 18:26:46

1

QUERY#1

SELECT id,letter,IFNULL(date,dt) date FROM mytable, 
(SELECT DATE(mindate + INTERVAL (secdiff/2) SECOND) dt 
FROM (SELECT mindate,UNIX_TIMESTAMP(maxdate) 
- UNIX_TIMESTAMP(mindate) secdiff 
FROM (SELECT MIN(date) mindate FROM mytable) N, 
(SELECT MAX(date) maxdate FROM mytable) X) AA) A; 

樣本數據

mysql> DROP TABLE IF EXISTS mytable; 
Query OK, 0 rows affected (0.00 sec) 

mysql> CREATE TABLE mytable 
    -> (
    -> id int not null auto_increment, 
    -> letter char(1), 
    -> `date` date, 
    -> primary key (id) 
    ->); 
Query OK, 0 rows affected (0.07 sec) 

mysql> INSERT INTO mytable (letter,date) VALUES 
    -> ('A','2012-01-01'),('B',NULL),('C',NULL),('D','2012-01-15'); 
Query OK, 4 rows affected (0.00 sec) 
Records: 4 Duplicates: 0 Warnings: 0 

mysql> SELECT * FROM mytable; 
+----+--------+------------+ 
| id | letter | date  | 
+----+--------+------------+ 
| 1 | A  | 2012-01-01 | 
| 2 | B  | NULL  | 
| 3 | C  | NULL  | 
| 4 | D  | 2012-01-15 | 
+----+--------+------------+ 
4 rows in set (0.00 sec) 

mysql> 

QUERY#1中執行

mysql> SELECT id,letter,IFNULL(date,dt) date FROM mytable, 
    -> (SELECT DATE(mindate + INTERVAL (secdiff/2) SECOND) dt 
    -> FROM (SELECT mindate,UNIX_TIMESTAMP(maxdate) 
    -> - UNIX_TIMESTAMP(mindate) secdiff 
    -> FROM (SELECT MIN(date) mindate FROM mytable) N, 
    -> (SELECT MAX(date) maxdate FROM mytable) X) AA) A; 
+----+--------+------------+ 
| id | letter | date  | 
+----+--------+------------+ 
| 1 | A  | 2012-01-01 | 
| 2 | B  | 2012-01-08 | 
| 3 | C  | 2012-01-08 | 
| 4 | D  | 2012-01-15 | 
+----+--------+------------+ 
4 rows in set (0.00 sec) 

mysql> 

QUERY#2(清潔版)

此查詢使用UNIX時間戳記的平均值。如果所有的日期是NULL,它使用今天的日期:

SELECT id,letter,IFNULL(date,dt) date FROM mytable, 
(
    SELECT IF(K=0,DATE(NOW()),avgdt) dt FROM 
    (SELECT DATE(FROM_UNIXTIME(AVG(UNIX_TIMESTAMP(date)))) 
    avgdt FROM mytable) AA, 
    (SELECT COUNT(date) K FROM mytable) BB 
) A; 

QUERY#2中執行

mysql> SELECT id,letter,IFNULL(date,dt) date FROM mytable, 
    -> (
    ->  SELECT IF(K=0,DATE(NOW()),avgdt) dt FROM 
    ->  (SELECT DATE(FROM_UNIXTIME(AVG(UNIX_TIMESTAMP(date)))) 
    ->  avgdt FROM mytable) AA, 
    ->  (SELECT COUNT(date) K FROM mytable) BB 
    ->) A; 
+----+--------+------------+ 
| id | letter | date  | 
+----+--------+------------+ 
| 1 | A  | 2012-01-01 | 
| 2 | B  | 2012-01-08 | 
| 3 | C  | 2012-01-08 | 
| 4 | D  | 2012-01-15 | 
+----+--------+------------+ 
4 rows in set (0.05 sec) 

mysql> 

試試看!

+0

感謝。但是,這將改變我所有的NULL-S非空值的第一對之間的平均日期。請參見更新的問題:( – 2013-03-22 19:08:51