背景

運維反饋說線上一個接口越來越慢，大概二三十秒才有返回。

查看接口代碼，定位問題出在sql查詢效率上。

sql意圖是將多個號碼最新的數據返回給前端使用。

單表數據量 530萬左右，id為主鍵，phone為普通索引

優化過程

原sql如下，通過in + 子查詢方式查詢

select * from
	t_user_track 
where
	id in 
    (select max(id) as id 
  		from t_user_track 
			where phone in ('xxxxx', 'xxxxx') 
			group by phone)

執行時間在30秒左右，

explain查看執行計劃

可以看出子查詢走上了索引，主查詢沒有走索引，全表查詢，rows=2333216

同事寫了一段新的sql來查詢，如下

select * from (
            select
            DISTINCT *
            from t_user_track
            where
            phone in ('xxxxx', 'xxxx')
            order by locate_time desc
        ) t
        group by phone;

執行時間在4秒左右

查看執行計劃

派生表查詢走上了phone的索引，rows=157108，主查詢沒有走上索引，row=157108

改成如下sql，關聯表查詢

select *
  from t_user_track t1,
    (select max(id) as id from t_user_track 
  			where phone in ('xxxxxx', 'xxxxx') group by phone) t2
where t1.id = t2.id;

查詢時間為0.04秒

執行計劃如下

改成inner join關聯表，如下sql

select * 
  from t_user_track t1 
  INNER JOIN
  (select max(id) as id from t_user_track where phone in ('xxxxxx', 'xxxxx') group by phone) t2
 on t1.id = t2.id

執行時間為0.041秒

執行計劃如下

結果

3、4兩種關聯表查詢效率最高

原理分析

3、4兩種連表查詢只有一次查詢，且通過小表驅動大表，所以查詢效率較高
第一種in + 子查詢的方式，需要兩次查詢，查詢效率較低

留下一個疑問in+子查詢的方式，為什么子查詢走了索引，主查詢卻沒有走索引

MySQL官方文檔

mysql手冊也提到過，具體的原文在mysql文檔的這個章節：

I.3. Restrictions on Subqueries

13.2.8. Subquery Syntax

摘抄：

1）關于使用IN的子查詢：

Subquery optimization for IN is not as effective as for the = operator or for IN(value_list) constructs.

A typical case for poor IN subquery performance is when the subquery returns a small number of rows but the outer query returns a large number of rows to be compared to the subquery result.

The problem is that, for a statement that uses an IN subquery, the optimizer rewrites it as a correlated subquery. Consider the following statement that uses an uncorrelated subquery:

SELECT ... FROM t1 WHERE t1.a IN (SELECT b FROM t2);

The optimizer rewrites the statement to a correlated subquery:

SELECT ... FROM t1 WHERE EXISTS (SELECT 1 FROM t2 WHERE t2.b = t1.a);

If the inner and outer queries return M and N rows, respectively, the execution time becomes on the order of O(M×N), rather than O(M+N) as it would be for an uncorrelated subquery.

An implication is that an IN subquery can be much slower than a query written using an IN(value_list) construct that lists the same values that the subquery would return.

2）關于把子查詢轉換成join的：

The optimizer is more mature for joins than for subqueries, so in many cases a statement that uses a subquery can be executed more efficiently if you rewrite it as a join.

An exception occurs for the case where an IN subquery can be rewritten as a SELECT DISTINCT join. Example:

SELECT col FROM t1 WHERE id_col IN (SELECT id_col2 FROM t2 WHERE condition);

That statement can be rewritten as follows:

SELECT DISTINCT col FROM t1, t2 WHERE t1.id_col = t2.id_col AND condition;

But in this case, the join requires an extra DISTINCT operation and is not more efficient than the subquery

日日操夜夜添-日日操影院-日日草夜夜操-日日干干-精品一区二区三区波多野结衣-精品一区二区三区高清免费不卡

600倍效率提升sql查詢優化從幾十秒到幾十毫秒

背景

優化過程

結果

原理分析

MySQL官方文檔

數獨大挑戰2018-06-03

答題星2018-06-03

全階人生考試2018-06-03

運動步數有氧達人2018-06-03

每日養生app2018-06-03

體育訓練成績評定2018-06-03