记一次SQL查询优化

2018-10-31 0 评论字数统计: 1.1k(字) 阅读时长: 4(分)

场景：A表数据量大概6万，B表数据量在120万左右，需要在A,B 表中联合查询出符合条件的A表中的数据，不需要B表中的内容，B表中的记录仅仅作为查询条件。

主要关联关系是B表中的object_id保存着A表中的id，表结构大致如下

create table A(
	id bigint primary key auto_increment,
    cat_id bigint not null,// 类目id
    condition1 varchar(100) not null,
    condition2 varchar(100) not null,
    .....
)
// B表中保存着A表的扩展字段，所有动态扩展的字段都保存在B表中，仅通过type_id和cat_id区分
create table B(
	id bigint primary key auto_increment,
    object_id bigint not null,
    cat_id bigint not null,// 类目的id
    attr_id bigint not null,// 属性名称id
    value varchar(100) not null,// 保存值
	....
    constraint `out_obj_id` foreign key (`object_id`) reference A(`ID`)// object_id作为外键关联A表中的id
)

A表作为一个对象的固定属性表，目前的场景是，需要为A表添加动态扩展的属性，但是不去动A表的表结构。所以便有了B表，作为动态扩展属性值保存的表，如果没有扩展的属性在需要查询结果时，只需要常规的SELECT操作，对必要的字段加索引就行。但是现在有了扩展属性，且数据不保存在A表中，也就是当需要对扩展的动态属性作为条件进行查询A表中的记录。

从一开始的想法是通过join联合查询

select A.* from A where A.condition1='A条件' and A.cat_id=$catId$ 
join B t1 on t1.attr_id=$attr1$ and A.id=t1.object_id and t1.cat_id=$catId$ and t1.value like '%条件1%'
join B t2 on t2.attr_id=$attr2$ and A.id=t2.object_id and t2.cat_id=$catId$ and
t2.value like '%条件2%'
join B t3 on t3.attr_id=$attr3$ and A.id=t3.object_id and t3.cat_id=$catId$ and
t3.value like '%条件3%'
...

这个方法在数据量较少的时候查询效率还可以，但是当测试数据量变得比较大时，效率便大打折扣，一次查询需要2-3秒之久

后来稍作修改，改成如下，但是效率还是一样没什么差别，甚至更差

select A.* from A where A.condition1='A条件' and A.cat_id=$catId$ AND A.id in(
SELECT t1.object_id from (SELECT * FROM B t on t.attr_id=$attr1$ and t1.cat_id=$catId$ and t1.value like '%条件1%') t1
join B t2 on t2.attr_id=$attr2$ and t1.object_id=t2.object_id and t2.cat_id=$catId$ and
t2.value like '%条件2%'
join B t3 on t3.attr_id=$attr3$ and t1.object_id=t3.object_id and t3.cat_id=$catId$ and
t3.value like '%条件3%'
...
)

后来仔细分析了一下上述两种方法，主要瓶颈都在join的上面了，而in本身效率也就差，对查询效率没有任何优化可言。上面的join主要涉及到的是B表join了多次，因为不同的属性必须在查询一遍，而当一次join就是加一层嵌套循环，效率可想而知的会很差。
后来通过网上查阅资料，发现自己把一个重要的操作EXISTS给忘掉了，在当前场景下，并不需要B表中的任何数据，仅仅将B表中的记录作为条件而已，那么正好可以使用exists来判断是否符合条件。
1
2
3
4
网上示例：
select * from A where id in (select id from B);
上面的in查询优化成下面的exists
select * from A where exists (select 1 from B where A.id=B.id);
- exists只返回true or false，当且仅当结果为true时才将结果集保存下来
- 而且多个exists and链接时，有一个为false那么后面的可以直接抛弃，而不用像join一样结果还需要去循环比较，也就是在这里可以大大的提升效率

最终的SQL优化如下

select A.* from A where A.condition1='A条件' and A.cat_id=$catId$ 
  and exists ( select t1.* from B t1 on t1.attr_id=$attr1$ and A.id=t1.object_id and t1.cat_id=$catId$ and t1.value like '%条件1%')
  and exists ( select t2.* from B t2 on t2.attr_id=$attr2$ and A.id=t2.object_id and t2.cat_id=$catId$ and
  t2.value like '%条件2%')
  and exists (select t3.* from B t3 on t3.attr_id=$attr3$ and A.id=t3.object_id and t3.cat_id=$catId$ and
  t3.value like '%条件3%')
  ....

在同等数据量的情况下，将2-3s的查询操作优化到了150ms左右，可以说是质变了

需要注意的是并不是所有情况下都可以使用exists来替代in操作
- IN适合于外表大而内表小的情况；
- EXISTS适合于外表小而内表大的情况。

参考

本文链接： https://tonyjiangwj.github.io/2018/10/31/%E8%AE%B0%E4%B8%80%E6%AC%A1SQL%E6%9F%A5%E8%AF%A2%E4%BC%98%E5%8C%96/

版权声明： 本博客所有文章除特别声明外，均采用 CC BY 4.0 CN协议许可协议。转载请注明出处！

TonyJiangWJBackend & Rookie Web Developer

个人简介。