對不起,問題標題有些模糊,所以這裏有一個工作示例。如何爲一個非常大的數據集分組,結果集和篩選結果集
我有一張表,每個用戶(用戶標識符)每隔幾天獲取一個值。我想查找每個用戶的最後一個值,按月分列,並將他們的數字計入一個範圍。
下面是一個例子表和代表性數據:
CREATE TABLE `datasource` (
`id` INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY ,
`userId` INT UNSIGNED NOT NULL ,
`unixts` INT UNSIGNED NOT NULL ,
`value` INT UNSIGNED NOT NULL ,
INDEX (`userId`)
);
INSERT INTO `datasource`
(`userId`, `unixts`, `value`)
VALUES
(1, UNIX_TIMESTAMP('2010-07-01'), 500),
(1, UNIX_TIMESTAMP('2010-07-15'), 610),
(1, UNIX_TIMESTAMP('2010-08-02'), 740),
(2, UNIX_TIMESTAMP('2010-07-03'), 506),
(2, UNIX_TIMESTAMP('2010-07-18'), 640),
(2, UNIX_TIMESTAMP('2010-08-09'), 340),
(3, UNIX_TIMESTAMP('2010-07-03'), 506),
(3, UNIX_TIMESTAMP('2010-08-18'), 640)
;
現在,這裏有一個查詢來獲取我所追求的:
select
month(FROM_UNIXTIME(unixts)) as month,
sum(if(value >= 700, 1, 0)) as '700 and up',
sum(if(value BETWEEN 600 AND 699, 1, 0)) as '600-699',
sum(if(value BETWEEN 500 AND 599, 1, 0)) as '500-599',
sum(if(value <= 499, 1, 0)) as '499 and below',
count(*) as total
from
datasource
where
id in (
select
max(id)
from
datasource
where
unixts between UNIX_TIMESTAMP('2010-07-01') and UNIX_TIMESTAMP('2010-09-01')
group by
userId, month(from_unixtime(unixts))
)
group by
month(FROM_UNIXTIME(unixts));
+-------+------------+---------+---------+---------------+-------+
| month | 700 and up | 600-699 | 500-599 | 499 and below | total |
+-------+------------+---------+---------+---------------+-------+
| 7 | 0 | 2 | 1 | 0 | 3 |
| 8 | 1 | 1 | 0 | 1 | 3 |
+-------+------------+---------+---------+---------------+-------+
這個查詢我們的小結果集的偉大工程。但是,如果您將44,000,000行記錄到數據源表中,它會停下來。
有沒有一種優化的方式來寫這個查詢,可以實現我想要的東西,而不用盯住MySQL幾天?