2011-01-14 142 views
2

我有三個表,我存儲實際人員數據(person),團隊(team)和條目(athlete)。三個表的模式是:提高SQL查詢性能

Database schema

在每隊有可能是兩個或兩個以上的運動員。

我試圖創建一個查詢來產生最頻繁的對,這意味着誰在兩個小組中玩。我想出了以下查詢:

SELECT p1.surname, p1.name, p2.surname, p2.name, COUNT(*) AS freq 
FROM person p1, athlete a1, person p2, athlete a2 
WHERE 
    p1.id = a1.person_id AND 
    p2.id = a2.person_id AND 
    a1.team_id = a2.team_id AND 
    a1.team_id IN 
      (SELECT team.id 
      FROM team, athlete 
      WHERE team.id = athlete.team_id 
      GROUP BY team.id 
      HAVING COUNT(*) = 2) 
GROUP BY p1.id 
ORDER BY freq DESC 

顯然這是一個耗費資源的查詢。有沒有辦法改進它?

+0

威爾索引幫助嗎? – Sudantha 2011-01-14 10:31:27

+0

並非如此,所有內容都被正確編入索引。問題是,該數據庫包含幾十萬行(負責人:10K,團隊:450K,運動員:900K) – Anax 2011-01-14 10:37:05

+1

子查詢沒有連接子句 - 你既需要團隊和運動員表中的子查詢? – 2011-01-14 10:37:20

回答

4
SELECT id 
FROM team, athlete 
WHERE team.id = athlete.team_id 
GROUP BY team.id 
HAVING COUNT(*) = 2 

性能技巧1:您只需要athlete表。

2

您可能會考慮使用觸發器維護團隊和人員表中的計數器的以下方法,以便您可以輕鬆找出哪些團隊有2個或更多運動員以及哪些人員在2個或更多個團隊中。

(注:我已經移除你的運動員表替代ID鍵,取而代之的是複合鍵,這將更好的數據完整性的我也改名爲運動員team_athlete)

drop table if exists person; 
create table person 
(
person_id int unsigned not null auto_increment primary key, 
name varchar(255) not null, 
team_count smallint unsigned not null default 0 
) 
engine=innodb; 

drop table if exists team; 
create table team 
(
team_id int unsigned not null auto_increment primary key, 
name varchar(255) not null, 
athlete_count smallint unsigned not null default 0, 
key (athlete_count) 
) 
engine=innodb; 

drop table if exists team_athlete; 
create table team_athlete 
(
team_id int unsigned not null, 
person_id int unsigned not null, 
primary key (team_id, person_id), -- note clustered composite PK 
key person(person_id) -- added index 
) 
engine=innodb; 

delimiter # 

create trigger team_athlete_after_ins_trig after insert on team_athlete 
for each row 
begin 
    update team set athlete_count = athlete_count+1 where team_id = new.team_id; 
    update person set team_count = team_count+1 where person_id = new.person_id; 
end# 

delimiter ; 

insert into person (name) values ('p1'),('p2'),('p3'),('p4'),('p5'); 
insert into team (name) values ('t1'),('t2'),('t3'),('t4'); 

insert into team_athlete (team_id, person_id) values 
(1,1),(1,2),(1,3), 
(2,3),(2,4), 
(3,1),(3,5); 

select * from team_athlete; 
select * from person; 
select * from team; 

select * from team where athlete_count >= 2; 
select * from person where team_count >= 2; 

編輯

添加以下最初誤解問題:

創建視圖僅包括2人的團隊。

drop view if exists teams_with_2_players_view; 

create view teams_with_2_players_view as 
select 
t.team_id, 
ta.person_id, 
p.name as person_name 
from 
team t 
inner join team_athlete ta on t.team_id = ta.team_id 
inner join person p on ta.person_id = p.person_id 
where 
t.athlete_count = 2; 

現在使用的視圖以發現最頻繁出現的人對。

select 
p1.person_id as p1_person_id, 
p1.person_name as p1_person_name, 
p2.person_id as p2_person_id, 
p2.person_name as p2_person_name, 
count(*) as counter 
from 
teams_with_2_players_view p1 
inner join teams_with_2_players_view p2 on 
    p2.team_id = p1.team_id and p2.person_id > p1.person_id 
group by 
p1.person_id, p2.person_id 
order by 
counter desc; 

希望這有助於:)

EDIT 2檢查性能

select count(*) as counter from person; 

+---------+ 
| counter | 
+---------+ 
| 10000 | 
+---------+ 
1 row in set (0.00 sec) 

select count(*) as counter from team; 

+---------+ 
| counter | 
+---------+ 
| 450000 | 
+---------+ 
1 row in set (0.08 sec) 

select count(*) as counter from team where athlete_count = 2; 

+---------+ 
| counter | 
+---------+ 
| 112644 | 
+---------+ 
1 row in set (0.03 sec) 

select count(*) as counter from team_athlete; 

+---------+ 
| counter | 
+---------+ 
| 1124772 | 
+---------+ 
1 row in set (0.21 sec) 

explain 
select 
p1.person_id as p1_person_id, 
p1.person_name as p1_person_name, 
p2.person_id as p2_person_id, 
p2.person_name as p2_person_name, 
count(*) as counter 
from 
teams_with_2_players_view p1 
inner join teams_with_2_players_view p2 on 
    p2.team_id = p1.team_id and p2.person_id > p1.person_id 
group by 
p1.person_id, p2.person_id 
order by 
counter desc 
limit 10; 

+----+-------------+-------+--------+---------------------+-------------+---------+---------------------+-------+----------------------------------------------+ 
| id | select_type | table | type | possible_keys  | key   | key_len | ref     | rows | Extra          | 
+----+-------------+-------+--------+---------------------+-------------+---------+---------------------+-------+----------------------------------------------+ 
| 1 | SIMPLE  | t  | ref | PRIMARY,t_count_idx | t_count_idx | 2 | const    | 86588 | Using index; Using temporary; Using filesort | 
| 1 | SIMPLE  | t  | eq_ref | PRIMARY,t_count_idx | PRIMARY  | 4 | foo_db.t.team_id |  1 | Using where         | 
| 1 | SIMPLE  | ta | ref | PRIMARY,person  | PRIMARY  | 4 | foo_db.t.team_id |  1 | Using index         | 
| 1 | SIMPLE  | p  | eq_ref | PRIMARY    | PRIMARY  | 4 | foo_db.ta.person_id |  1 |            | 
| 1 | SIMPLE  | ta | ref | PRIMARY,person  | PRIMARY  | 4 | foo_db.t.team_id |  1 | Using where; Using index      | 
| 1 | SIMPLE  | p  | eq_ref | PRIMARY    | PRIMARY  | 4 | foo_db.ta.person_id |  1 |            | 
+----+-------------+-------+--------+---------------------+-------------+---------+---------------------+-------+----------------------------------------------+ 

6 rows in set (0.00 sec) 

select 
p1.person_id as p1_person_id, 
p1.person_name as p1_person_name, 
p2.person_id as p2_person_id, 
p2.person_name as p2_person_name, 
count(*) as counter 
from 
teams_with_2_players_view p1 
inner join teams_with_2_players_view p2 on 
    p2.team_id = p1.team_id and p2.person_id > p1.person_id 
group by 
p1.person_id, p2.person_id 
order by 
counter desc 
limit 10; 

+--------------+----------------+--------------+----------------+---------+ 
| p1_person_id | p1_person_name | p2_person_id | p2_person_name | counter | 
+--------------+----------------+--------------+----------------+---------+ 
|   221 | person 221  |   739 | person 739  |  5 | 
|   129 | person 129  |   249 | person 249  |  5 | 
|   874 | person 874  |   877 | person 877  |  4 | 
|   717 | person 717  |   949 | person 949  |  4 | 
|   395 | person 395  |   976 | person 976  |  4 | 
|   415 | person 415  |   828 | person 828  |  4 | 
|   287 | person 287  |   470 | person 470  |  4 | 
|   455 | person 455  |   860 | person 860  |  4 | 
|   13 | person 13  |   29 | person 29  |  4 | 
|   1 | person 1  |   743 | person 743  |  4 | 
+--------------+----------------+--------------+----------------+---------+ 
10 rows in set (2.02 sec) 
0

如果有一個額外的約束a1.person_id!= a2.person_id,以避免產生一對同一個玩家?這可能不會影響結果的最終排序,但會影響計數的準確性。

如果可能的話,你可以在team表中添加一個名爲athlete_count的列(帶有索引),當隊員被添加或刪除時,可以更新這個列,這可以避免需要通過整個運動員的子查詢發現兩隊球員的表。另外,如果我正確理解原始查詢,那麼當您通過p1.id進行分組時,您只能獲得玩家在雙人遊戲團隊中玩的次數,而不能計算遊戲對本身的數量。您可能需要Group BY p1.id,p2.id.基於整整兩個每隊

通過最內部正好有兩個人的預聚合

0

修訂,我可以使用MIN()和MAX(獲得每隊的人物和PersonB以單排每隊)。這樣,該人的身份證將始終處於低 - 高的配對設置,以供將來的團隊比較。然後,我可以通過所有團隊的共同Mate1和Mate2查詢COUNT並直接獲取他們的姓名。

SELECT STRAIGHT_JOIN 
     p1.surname, 
     p1.name, 
     p2.surname, 
     p2.name, 
     TeamAggregates.CommonTeams 
    from 
    (select PreQueryTeams.Mate1, 
       PreQueryTeams.Mate2, 
       count(*) CommonTeams 
      from 
       (SELECT team_id, 
         min(person_id) mate1, 
         max(person_id) mate2 
        FROM 
         athlete 
        group by 
         team_id 
        having count(*) = 2) PreQueryTeams 
      group by 
       PreQueryTeams.Mate1, 
       PreQueryTeams.Mate2 ) TeamAggregates, 
     person p1, 
     person p2 
    where 
      TeamAggregates.Mate1 = p1.Person_ID 
     and TeamAggregates.Mate2 = p2.Person_ID 
    order by 
     TeamAggregates.CommonTeams 

原來的答覆與隊友

任意數量的

我會通過以下做團隊。內prequery第一連接每個單獨的團隊的人所有可能的組合,但有PERSON1 < PERSON2將消除計算同一個人PERSON1和PERSON2。此外,將防止基於較高編號的人的ID反向...如

athlete person team 
1   1  1 
2   2  1 
3   3  1 
4   4  1 
5   1  2 
6   3  2 
7   4  2 
8   1  3 
9   4  3 

So, from team 1 you would get person pairs of 
1,2 1,3 1,4  2,3  2,4 3,4 
and NOT get reversed duplicates such as 
2,1 3,1 4,1  3,2  4,2 4,3 
nor same person 
1,1 2,2 3,3 4,4 


Then from team 2, you would hav pairs of 
1,3 1,4 3,4 

Finally in team 3 the single pair of 
1,4 

thus teammates 1,4 have occured in 3 common teams. 

SELECT STRAIGHT_JOIN 
     p1.surname, 
     p1.name, 
     p2.surname, 
     p2.name, 
     PreQuery.CommonTeams 
    from 
     (select 
      a1.Person_ID Person_ID1, 
      a2.Person_ID Person_ID2, 
      count(*) CommonTeams 
     from 
      athlete a1, 
      athlete a2 
     where 
       a1.Team_ID = a2.Team_ID 
      and a1.Person_ID < a2.Person_ID 
     group by 
      1, 2 
     having CommonTeams > 1) PreQuery, 
     person p1, 
     person p2 
    where 
      PreQuery.Person_ID1 = p1.id 
     and PreQuery.Person_ID2 = p2.id 
    order by 
     PreQuery.CommonTeams 
0

在這裏,一些提示,以提高像SQL SELECT查詢性能:

  • 使用SET NOCOUNT ON它有助於減少網絡流量從而 提高性能。
  • 使用完全合格的程序名(例如 database.schema.objectname
  • 使用sp_executesql而不是execute動態查詢
  • 不要使用select *使用select column1,column2,..IF EXISTSSELECT操作
  • 避免命名用戶存儲過程像sp_procedureName Becouse, 如果我們使用存儲過程的名稱開始在主數據庫sp_然後SQL第一 搜索。所以它可以降低查詢性能。