背景
運維反饋說線上一個接口越來越慢,大概二三十秒才有返回。
查看接口代碼,定位問題出在sql查詢效率上。
sql意圖是將多個號碼最新的數據返回給前端使用。
單表數據量 530萬左右,id為主鍵,phone為普通索引
優化過程
- 原sql如下,通過in + 子查詢方式查詢
select * from
t_user_track
where
id in
(select max(id) as id
from t_user_track
where phone in ('xxxxx', 'xxxxx')
group by phone)
執行時間在30秒左右,
explain查看執行計劃
可以看出子查詢走上了索引,主查詢沒有走索引,全表查詢,rows=2333216
- 同事寫了一段新的sql來查詢,如下
select * from (
select
DISTINCT *
from t_user_track
where
phone in ('xxxxx', 'xxxx')
order by locate_time desc
) t
group by phone;
執行時間在4秒左右
查看執行計劃
派生表查詢走上了phone的索引,rows=157108,主查詢沒有走上索引,row=157108
- 改成如下sql,關聯表查詢
select *
from t_user_track t1,
(select max(id) as id from t_user_track
where phone in ('xxxxxx', 'xxxxx') group by phone) t2
where t1.id = t2.id;
查詢時間為0.04秒
執行計劃如下
- 改成inner join關聯表,如下sql
select *
from t_user_track t1
INNER JOIN
(select max(id) as id from t_user_track where phone in ('xxxxxx', 'xxxxx') group by phone) t2
on t1.id = t2.id
執行時間為0.041秒
執行計劃如下
結果
3、4兩種關聯表查詢效率最高
原理分析
- 3、4兩種連表查詢只有一次查詢,且通過小表驅動大表,所以查詢效率較高
- 第一種in + 子查詢的方式,需要兩次查詢,查詢效率較低
留下一個疑問in+子查詢的方式,為什么子查詢走了索引,主查詢卻沒有走索引
MySQL官方文檔
mysql手冊也提到過,具體的原文在mysql文檔的這個章節:
I.3. Restrictions on Subqueries
13.2.8. Subquery Syntax
摘抄:
1)關于使用IN的子查詢:
Subquery optimization for IN is not as effective as for the = operator or for IN(value_list) constructs.
A typical case for poor IN subquery performance is when the subquery returns a small number of rows but the outer query returns a large number of rows to be compared to the subquery result.
The problem is that, for a statement that uses an IN subquery, the optimizer rewrites it as a correlated subquery. Consider the following statement that uses an uncorrelated subquery:
SELECT ... FROM t1 WHERE t1.a IN (SELECT b FROM t2);
The optimizer rewrites the statement to a correlated subquery:
SELECT ... FROM t1 WHERE EXISTS (SELECT 1 FROM t2 WHERE t2.b = t1.a);
If the inner and outer queries return M and N rows, respectively, the execution time becomes on the order of O(M×N), rather than O(M+N) as it would be for an uncorrelated subquery.
An implication is that an IN subquery can be much slower than a query written using an IN(value_list) construct that lists the same values that the subquery would return.
2)關于把子查詢轉換成join的:
The optimizer is more mature for joins than for subqueries, so in many cases a statement that uses a subquery can be executed more efficiently if you rewrite it as a join.
An exception occurs for the case where an IN subquery can be rewritten as a SELECT DISTINCT join. Example:
SELECT col FROM t1 WHERE id_col IN (SELECT id_col2 FROM t2 WHERE condition);
That statement can be rewritten as follows:
SELECT DISTINCT col FROM t1, t2 WHERE t1.id_col = t2.id_col AND condition;
But in this case, the join requires an extra DISTINCT operation and is not more efficient than the subquery