6000万个条目，选择某个月份的条目.如何优化数据库?

时间：2023-05-23

本文介绍了6000万个条目，选择某个月份的条目.如何优化数据库?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着跟版网的小编来一起学习吧！

问题描述

限时送ChatGPT账号..

我有一个包含 6000 万个条目的数据库.

I have a database with 60 million entries.

每个条目都包含:

身份证
数据源ID
一些数据
日期时间

我需要选择某个月份的条目.每个月包含大约 200 万个条目.

I need to select entries from certain month. Each month contains approximately 2 million entries.

 select * 
   from Entries 
  where time between "2010-04-01 00:00:00" and "2010-05-01 00:00:00"

(查询大约需要 1.5 分钟)

(query takes approximately 1.5 minutes)

我还想从给定的 DataSourceID 中选择某个月份的数据.(大约需要 20 秒)

I'd also like to select data from certain month from a given DataSourceID. (takes approximately 20 seconds)

大约有 50-100 个不同的 DataSourceID.

There are about 50-100 different DataSourceIDs.

有没有办法让它更快?我有哪些选择?如何优化这个数据库/查询?

Is there a way to make this faster? What are my options? How to optimize this database/query?

大约有.每秒 60-100 次插入！

There's approx. 60-100 inserts PER second!

推荐答案

利用 innodb 聚集主键索引.

Take advantage of innodb clustered primary key indexes.

http://dev.mysql.com/doc/refman/5.0/en/innodb-index-types.html

这将非常高效:

create table datasources
(
year_id smallint unsigned not null,
month_id tinyint unsigned not null,
datasource_id tinyint unsigned not null,
id int unsigned not null, -- needed for uniqueness
data int unsigned not null default 0,
primary key (year_id, month_id, datasource_id, id)
)
engine=innodb;

select * from datasources where year_id = 2011 and month_id between 1 and 3;

select * from datasources where year_id = 2011 and month_id = 4 and datasouce_id = 100;

-- etc..

编辑 2

忘记了我正在使用 3 个月的数据运行第一个测试脚本.这是一个月的结果:0.34 和 0.69 秒.

Forgot i was running the first test script with 3 months of data. Here's the results for a single month : 0.34 and 0.69 seconds.

select d.* from datasources d where d.year_id = 2010 and d.month_id = 3 and datasource_id = 100 order by d.id desc limit 10;
+---------+----------+---------------+---------+-------+
| year_id | month_id | datasource_id | id      | data  |
+---------+----------+---------------+---------+-------+
|    2010 |        3 |           100 | 3290330 | 38434 |
|    2010 |        3 |           100 | 3290329 |  9988 |
|    2010 |        3 |           100 | 3290328 | 25680 |
|    2010 |        3 |           100 | 3290327 | 17627 |
|    2010 |        3 |           100 | 3290326 | 64508 |
|    2010 |        3 |           100 | 3290325 | 14257 |
|    2010 |        3 |           100 | 3290324 | 45950 |
|    2010 |        3 |           100 | 3290323 | 49986 |
|    2010 |        3 |           100 | 3290322 |  2459 |
|    2010 |        3 |           100 | 3290321 | 52971 |
+---------+----------+---------------+---------+-------+
10 rows in set (0.34 sec)

select d.* from datasources d where d.year_id = 2010 and d.month_id = 3 order by d.id desc limit 10;
+---------+----------+---------------+---------+-------+
| year_id | month_id | datasource_id | id      | data  |
+---------+----------+---------------+---------+-------+
|    2010 |        3 |           116 | 3450346 | 42455 |
|    2010 |        3 |           116 | 3450345 | 64039 |
|    2010 |        3 |           116 | 3450344 | 27046 |
|    2010 |        3 |           116 | 3450343 | 23730 |
|    2010 |        3 |           116 | 3450342 | 52380 |
|    2010 |        3 |           116 | 3450341 | 35700 |
|    2010 |        3 |           116 | 3450340 | 20195 |
|    2010 |        3 |           116 | 3450339 | 21758 |
|    2010 |        3 |           116 | 3450338 | 51378 |
|    2010 |        3 |           116 | 3450337 | 34687 |
+---------+----------+---------------+---------+-------+
10 rows in set (0.69 sec)

编辑 1

决定用大约测试上述模式.6000 万行分布在 3 年内.每个查询都是冷运行的，即每个查询都单独运行，然后重新启动 mysql，清除任何缓冲区，并且没有查询缓存.

Decided to test the above schema with approx. 60 million rows spread over 3 years. Each query is run cold i.e. each run separately after which mysql is restarted clearing any buffers and with no query caching.

完整的测试脚本可以在这里找到:http://pastie.org/1723506 或以下...

The full test script can be found here : http://pastie.org/1723506 or below...

正如你所看到的，即使在我简陋的桌面上，它也是一个非常高性能的架构:)

As you can see it's a pretty performant schema even on my humble desktop :)

select count(*) from datasources;
+----------+
| count(*) |
+----------+
| 60306030 |
+----------+

select count(*) from datasources where year_id = 2010;
+----------+
| count(*) |
+----------+
| 16691669 |
+----------+

select
 year_id, month_id, count(*) as counter
from
 datasources
where 
 year_id = 2010
group by
 year_id, month_id;
+---------+----------+---------+
| year_id | month_id | counter |
+---------+----------+---------+
|    2010 |        1 | 1080108 |
|    2010 |        2 | 1210121 |
|    2010 |        3 | 1160116 |
|    2010 |        4 | 1300130 |
|    2010 |        5 | 1860186 |
|    2010 |        6 | 1220122 |
|    2010 |        7 | 1250125 |
|    2010 |        8 | 1460146 |
|    2010 |        9 | 1730173 |
|    2010 |       10 | 1490149 |
|    2010 |       11 | 1570157 |
|    2010 |       12 | 1360136 |
+---------+----------+---------+
12 rows in set (5.92 sec)


select 
 count(*) as counter
from 
 datasources d
where 
 d.year_id = 2010 and d.month_id between 1 and 3 and datasource_id = 100;

+---------+
| counter |
+---------+
|   30003 |
+---------+
1 row in set (1.04 sec)

explain
select 
 d.* 
from 
 datasources d
where 
 d.year_id = 2010 and d.month_id between 1 and 3 and datasource_id = 100
order by
 d.id desc limit 10;

+----+-------------+-------+-------+---------------+---------+---------+------+---------+-----------------------------+
| id | select_type | table | type  | possible_keys | key     | key_len | ref  |rows    | Extra                       |
+----+-------------+-------+-------+---------------+---------+---------+------+---------+-----------------------------+
|  1 | SIMPLE      | d     | range | PRIMARY       | PRIMARY | 4       | NULL |4451372 | Using where; Using filesort |
+----+-------------+-------+-------+---------------+---------+---------+------+---------+-----------------------------+
1 row in set (0.00 sec)


select 
 d.* 
from 
 datasources d
where 
 d.year_id = 2010 and d.month_id between 1 and 3 and datasource_id = 100
order by
 d.id desc limit 10;

+---------+----------+---------------+---------+-------+
| year_id | month_id | datasource_id | id      | data  |
+---------+----------+---------------+---------+-------+
|    2010 |        3 |           100 | 3290330 | 38434 |
|    2010 |        3 |           100 | 3290329 |  9988 |
|    2010 |        3 |           100 | 3290328 | 25680 |
|    2010 |        3 |           100 | 3290327 | 17627 |
|    2010 |        3 |           100 | 3290326 | 64508 |
|    2010 |        3 |           100 | 3290325 | 14257 |
|    2010 |        3 |           100 | 3290324 | 45950 |
|    2010 |        3 |           100 | 3290323 | 49986 |
|    2010 |        3 |           100 | 3290322 |  2459 |
|    2010 |        3 |           100 | 3290321 | 52971 |
+---------+----------+---------------+---------+-------+
10 rows in set (0.98 sec)


select 
 count(*) as counter
from 
 datasources d
where 
 d.year_id = 2010 and d.month_id between 1 and 3;

+---------+
| counter |
+---------+
| 3450345 |
+---------+
1 row in set (1.64 sec)

explain
select 
 d.* 
from 
 datasources d
where 
 d.year_id = 2010 and d.month_id between 1 and 3
order by
 d.id desc limit 10;

+----+-------------+-------+-------+---------------+---------+---------+------+---------+-----------------------------+
| id | select_type | table | type  | possible_keys | key     | key_len | ref  |rows    | Extra                       |
+----+-------------+-------+-------+---------------+---------+---------+------+---------+-----------------------------+
|  1 | SIMPLE      | d     | range | PRIMARY       | PRIMARY | 3       | NULL |6566916 | Using where; Using filesort |
+----+-------------+-------+-------+---------------+---------+---------+------+---------+-----------------------------+
1 row in set (0.00 sec)


select 
 d.* 
from 
 datasources d
where 
 d.year_id = 2010 and d.month_id between 1 and 3
order by
 d.id desc limit 10;

+---------+----------+---------------+---------+-------+
| year_id | month_id | datasource_id | id      | data  |
+---------+----------+---------------+---------+-------+
|    2010 |        3 |           116 | 3450346 | 42455 |
|    2010 |        3 |           116 | 3450345 | 64039 |
|    2010 |        3 |           116 | 3450344 | 27046 |
|    2010 |        3 |           116 | 3450343 | 23730 |
|    2010 |        3 |           116 | 3450342 | 52380 |
|    2010 |        3 |           116 | 3450341 | 35700 |
|    2010 |        3 |           116 | 3450340 | 20195 |
|    2010 |        3 |           116 | 3450339 | 21758 |
|    2010 |        3 |           116 | 3450338 | 51378 |
|    2010 |        3 |           116 | 3450337 | 34687 |
+---------+----------+---------------+---------+-------+
10 rows in set (1.98 sec)

希望这有帮助:)

这篇关于6000万个条目，选择某个月份的条目.如何优化数据库?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持跟版网！

上一篇：是否可以使用正则表达式在 MySQL 中强制执行数据检查 下一篇：如何在 MySQL 中生成数据?

6000万个条目，选择某个月份的条目.如何优化数据库?

问题描述

推荐答案

相关文章