海量數據庫和mysql

我們正在工作的一個新項目需要大量的數據分析，但我們發現這很慢，我們正在尋找方法來改變我們的方法與軟件和/或硬件。海量數據庫和mysql

我們目前在Amazon EC2實例上運行（Linux版）：

DB的

High-CPU Extra Large Instance 

7 GB of memory 
20 EC2 Compute Units (8 virtual cores with 2.5 EC2 Compute Units each) 
1690 GB of instance storage 
64-bit platform 
I/O Performance: High 
API name: c1.xlarge 


processor  : 7 
vendor_id  : GenuineIntel 
cpu family  : 6 
model   : 26 
model name  : Intel(R) Xeon(R) CPU   E5506 @ 2.13GHz 
stepping  : 5 
cpu MHz   : 2133.408 
cache size  : 4096 KB 

MemTotal:  7347752 kB 
MemFree:  728860 kB 
Buffers:   40196 kB 
Cached:  2833572 kB 
SwapCached:   0 kB 
Active:  5693656 kB 
Inactive:  456904 kB 
SwapTotal:   0 kB 
SwapFree:   0 kB

一部分是文章和實體，例如一個鏈接表：

mysql> DESCRIBE articles_entities; 
+------------+--------------+------+-----+---------+-------+ 
| Field  | Type   | Null | Key | Default | Extra | 
+------------+--------------+------+-----+---------+-------+ 
| id   | char(36)  | NO | PRI | NULL |  | 
| article_id | char(36)  | NO | MUL | NULL |  | 
| entity_id | char(36)  | NO | MUL | NULL |  | 
| created | datetime  | YES |  | NULL |  | 
| modified | datetime  | YES |  | NULL |  | 
| relevance | decimal(5,4) | YES | MUL | NULL |  | 
| analysers | text   | YES |  | NULL |  | 
| anchor  | varchar(255) | NO |  | NULL |  | 
+------------+--------------+------+-----+---------+-------+ 
8 rows in set (0.00 sec)

，你可以從下表可以看出，我們有很多以每天10萬以上的速度增長的配對

mysql> SELECT count(*) FROM articles_entities; 
+----------+ 
| count(*) | 
+----------+ 
| 2829138 | 
+----------+ 
1 row in set (0.00 sec)

像下面這樣一個簡單的查詢花費太多的時間（12秒）

mysql> SELECT count(*) FROM articles_entities WHERE relevance <= .4 AND relevance > 0; 
+----------+ 
| count(*) | 
+----------+ 
| 357190 | 
+----------+ 
1 row in set (11.95 sec)

我們應該怎麼考慮改善我們的查找時間？不同的DB存儲？不同的硬件。

來源

2011-01-20 Lizard

您的表格是否已正確編入索引？ – 2011-01-20 12:06:30

從提供的錶轉儲不明顯嗎？ – Lizard 2011-01-20 12:12:26

說到查詢性能，有三件事情很重要：

索引。內存。一切。

首先要做的是檢查你的索引。對你的查詢做一個解釋，以瞭解MySQL如何處理它們。

如果看起來合理，接下來的事情就是檢查內存。你的總數據庫有多大？內存現在很便宜，而且從內存運行的查詢要比從磁盤讀取的查詢要快得多。

在探究完這些之後，如果性能仍然很慢，那麼可能是考慮其他選項的時候了。

來源

2011-01-20 12:10:01

對鍵使用char（36）不是用MySQL可以做的最快的。如果可能，請使用INT類型的鍵。如果您對CHAR列進行索引，那麼與（BIG）INT索引（如果不是'正確'創建）相比，索引將會非常大。

但是，如果列值不是數字，則會卡住CHAR列仍然比VARCHAR更快，但可以創建大型索引）。

請提供表格的SHOW CREATE TABLE以查看關鍵字/索引參數，並且也如前面的答案所述，針對有問題的查詢的EXPLAIN可以幫助提供更好的答案。

PS。使用SHOW TABLE STATUS LIKE '{table_name}'來查看錶的索引（和數據）大小。

來源

2011-01-20 12:23:37 origo

正如mrorigo問，請提供SHOW CREATE TABLE articles_entities，以便我們可以看到您的表的實際索引。

由於從MySQL文檔http://dev.mysql.com/doc/refman/5.0/en/mysql-indexes.html

If the table has a multiple-column index, any leftmost prefix of the index can be used by the optimizer to find rows. 
For example, if you have a three-column index on (col1, col2, col3), you have indexed search capabilities on (col1), (col1, col2), and (col1, col2, col3). 

MySQL cannot use an index if the columns do not form a leftmost prefix of the index

所以一個音符，如果relevance是一個多列索引的一部分，但不是指數的最左邊一列，那麼指數不用於查詢。

這是一個經常被忽視的常見問題。

來源

2011-01-20 13:08:17 YoGiN

海量數據庫和mysql

回答

相關問題