一、表连接介绍
表连接类型
表连接是关系型数据库关键特性,在关系型数据库中,表连接分为三类:循环嵌套连接(Nested Loops Join)、哈希连接(Hash Join)、合并排序连接(Merge Sort Join)。假设存在表A和表B,都存在ID列,通过id列连接。
Nested Loops Join:遍历A中所有id,依次拿id和表B中的id对比。
Hash Join:对A中的id做HASH运算,放入多个HASH Bucket中,根据B表的id列的hash值和HASH Bucket匹配。
Merge Sort Join:对A表和B表的id列进行排序,按照排序结果进行连接。
表连接分析
表的访问次数,由执行计划的starts列来表示
- NL连接驱动表被访问0次或1次,被驱动表被访问0次或N次,N的值取决于驱动表的返回行数。
- HASH连接,驱动表和被驱动表都是被访问0次或1次,大部分场景是驱动表和被驱动表都访问1次。
- MS连接,驱动表和被驱动表都是被访问0次或1次,大部分场景是驱动表和被驱动表都访问1次。
驱动顺序
- NL连接的性能与驱动顺序有关,一般小表做驱动表,性能更好。
- HASH连接的性能和驱动顺序有关,一般小表做驱动表。
- MS连接的性能和驱动顺序无关。
是否排序
- NL连接不排序、不多余消耗内存
- HASH连接不排序,但是hash area多消耗内存
- MS连接排序,消耗sort area内存
使用限制
- NL连接支持各种写法,无限制。
- HASH连接支持等值连接,不支持>、<、like、<>等。
- MS连接不支持>、<、like、<>等
适用场景
- NL连接一般适用于OLTP系统
- HASH连接、MS连接适用于OLAP系统
二、表连接测试
创建测试数据
--创建表T1和T2
CREATE TABLE t1 (
id NUMBER NOT NULL,
n NUMBER,
contents VARCHAR2(4000));CREATE TABLE t2 (
id NUMBER NOT NULL,
t1_id NUMBER NOT NULL,
n NUMBER,
contents VARCHAR2(4000));--插入测试数据
execute dbms_random.seed(0);
INSERT INTO t1
SELECT rownum, rownum, dbms_random.string('a', 50)
FROM dual
CONNECT BY level <= 100
ORDER BY dbms_random.random;
INSERT INTO t2 SELECT rownum, rownum, rownum, dbms_random.string('b', 50) FROM dual CONNECT BY level <= 100000
ORDER BY dbms_random.random;
COMMIT;--查看数据量
SQL> select count(*) from t1;COUNT(*)
----------100
SQL> select count(*) from t2;COUNT(*)
----------100000
NL连接优化实验
-- 两个表无索引执行计划
alter session set statistics_level=all;
select * from t1,t2 where t1.id=t2.t1_id and t1.n=19;
select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));----------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.02 | 1016 | | | |
|* 1 | HASH JOIN | | 1 | 1 | 1 |00:00:00.02 | 1016 | 1000K| 1000K| 407K (0)|
|* 2 | TABLE ACCESS FULL| T1 | 1 | 1 | 1 |00:00:00.01 | 7 | | | |
| 3 | TABLE ACCESS FULL| T2 | 1 | 103K| 100K|00:00:00.01 | 1006 | | | |
------------------------------------------------------------------------------------------------------------------驱动表创建索引create index t1_n on t1(n);select * from t1,t2 where t1.id=t2.t1_id and t1.n=19;select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));
-------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
-------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.05 | 1008 |
|* 1 | HASH JOIN | | 1 | 1 | 1 |00:00:00.05 | 1008 |
| 2 | TABLE ACCESS BY INDEX ROWID BATCHED| T1 | 1 | 1 | 1 |00:00:00.01 | 2 |
|* 3 | INDEX RANGE SCAN | T1_N | 1 | 1 | 1 |00:00:00.01 | 1 |
| 4 | TABLE ACCESS FULL | T2 | 1 | 103K| 100K|00:00:00.02 | 1006 |
-------------------------------------------------------------------------------------------------------
--被驱动表创建索引
CREATE INDEX t2_t1_id ON t2(t1_id);
select * from t1,t2 where t1.id=t2.t1_id and t1.n=19;
select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));
-----------------------------------------------------------------------------------------------------------
| Id | Operation | Name |Starts| E-Rows|A-Rows| A-Time |Buffers|Reads |
-----------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1| | 1|00:00:00.01 | 7| 4|
| 1 | NESTED LOOPS | | 1| 1| 1|00:00:00.01 | 7| 4|
| 2 | NESTED LOOPS | | 1| 1| 1|00:00:00.01 | 6| 4|
| 3 | TABLE ACCESS BY INDEX ROWID BATCHED| T1 | 1| 1| 1|00:00:00.01 | 3| 0|
|* 4 | INDEX RANGE SCAN | T1_N | 1| 1| 1|00:00:00.01 | 2| 0|
|* 5 | INDEX RANGE SCAN | T2_T1_ID| 1| 1| 1|00:00:00.01 | 3| 4|
| 6 | TABLE ACCESS BY INDEX ROWID | T2 | 1| 1| 1|00:00:00.01 | 1| 0|
-----------------------------------------------------------------------------------------------------------
--大表驱动小表
SELECT /*+ leading(t2) use_nl(t1) */ *2 FROM t1, t23 WHERE t1.id = t2.t1_id4 AND t1.n = 19;
-----------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
-----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.11 | 1013 |
| 1 | NESTED LOOPS | | 1 | 1 | 1 |00:00:00.11 | 1013 |
| 2 | NESTED LOOPS | | 1 | 103K| 100K|00:00:00.07 | 1011 |
| 3 | TABLE ACCESS FULL | T2 | 1 | 103K| 100K|00:00:00.01 | 1006 |
|* 4 | INDEX RANGE SCAN | T1_N | 100K| 1 | 100K|00:00:00.04 | 5 |
|* 5 | TABLE ACCESS BY INDEX ROWID| T1 | 100K| 1 | 1 |00:00:00.02 | 2 |
-----------------------------------------------------------------------------------------------
--尽管两个表都建立了索引,但是如果大表驱动小表,反而消耗更大。
--总结:
--驱动表和被驱动表考虑创建索引
--确保小表驱动大表
HASH连接优化实验
--删除索引drop index t1_n;drop index t2_t1_id;select * from t1,t2 where t1.id=t2.t1_id and t1.n=19;
select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));----------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.02 | 1016 | | | |
|* 1 | HASH JOIN | | 1 | 1 | 1 |00:00:00.02 | 1016 | 1000K| 1000K| 407K (0)|
|* 2 | TABLE ACCESS FULL| T1 | 1 | 1 | 1 |00:00:00.01 | 7 | | | |
| 3 | TABLE ACCESS FULL| T2 | 1 | 103K| 100K|00:00:00.01 | 1006 | | | |
------------------------------------------------------------------------------------------------------------------创建索引
create index idx_t1_n on t1(n);--------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
--------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.06 | 1008 | | | |
|* 1 | HASH JOIN | | 1 | 1 | 1 |00:00:00.06 | 1008 | 1000K| 1000K| 402K (0)|
| 2 | TABLE ACCESS BY INDEX ROWID BATCHED| T1 | 1 | 1 | 1 |00:00:00.01 | 2 | | | |
|* 3 | INDEX RANGE SCAN | IDX_T1_N | 1 | 1 | 1 |00:00:00.01 | 1 | | | |
| 4 | TABLE ACCESS FULL | T2 | 1 | 100K| 100K|00:00:00.02 | 1006 | | | |
----------------------------------------------------------------------------------------------------------------------------------------总结
--驱动表和被驱动表考虑创建索引
--确保小表驱动大表
--由于HASH操作需要额外内存区域(hash area),如果需要做hash运算的数据太多,则会用的临时表空间,涉及磁盘IO会大大降低性能,索引尽量保证hash运算在内存中完成。
MS连接优化实验
--Merge Sort Join两表限制条件皆无索引SQL> SELECT /*+ leading(t2) use_merge(t1)*/ *2 FROM t1, t23 WHERE t1.id = t2.t1_id4 and t1.n=19;---------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.06 | 1007 | | | |
| 1 | MERGE JOIN | | 1 | 1 | 1 |00:00:00.06 | 1007 | | | |
| 2 | SORT JOIN | | 1 | 100K| 20 |00:00:00.06 | 1005 | 9762K| 1209K| 8677K (0)|
| 3 | TABLE ACCESS FULL | T2 | 1 | 100K| 100K|00:00:00.01 | 1005 | | | |
|* 4 | SORT JOIN | | 20 | 1 | 1 |00:00:00.01 | 2 | 2048 | 2048 | 2048 (0)|
| 5 | TABLE ACCESS BY INDEX ROWID BATCHED| T1 | 1 | 1 | 1 |00:00:00.01 | 2 | | | |
|* 6 | INDEX RANGE SCAN | IDX_T1_N | 1 | 1 | 1 |00:00:00.01 | 1 | | | |
--------------------------------------------------------------------------------------------------------------------限制条件有索引
create index idx_t1_n on t1(n);
create index idx_t2_n on t2(n);
SQL> SELECT /*+ leading(t2) use_merge(t1)*/ *2 FROM t1, t23 WHERE t1.id = t2.t1_id4 and t1.n=195 and t2.n=12;SQL> select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));
---------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
---------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 0 |00:00:00.01 | 5 | | | |
| 1 | MERGE JOIN | | 1 | 1 | 0 |00:00:00.01 | 5 | | | |
| 2 | SORT JOIN | | 1 | 1 | 1 |00:00:00.01 | 3 | 2048 | 2048 | 2048 (0)|
| 3 | TABLE ACCESS BY INDEX ROWID BATCHED| T2 | 1 | 1 | 1 |00:00:00.01 | 3 | | | |
|* 4 | INDEX RANGE SCAN | IDX_T2_N | 1 | 1 | 1 |00:00:00.01 | 2 | | | |
|* 5 | SORT JOIN | | 1 | 1 | 0 |00:00:00.01 | 2 | 2048 | 2048 | 2048 (0)|
| 6 | TABLE ACCESS BY INDEX ROWID BATCHED| T1 | 1 | 1 | 1 |00:00:00.01 | 2 | | | |
|* 7 | INDEX RANGE SCAN | IDX_T1_N | 1 | 1 | 1 |00:00:00.01 | 1 | | | |
-----------------------------------------------------------------------------------------------------------------------------------------连接条件创建索引
set autotrace traceonly
SQL> SELECT /*+ leading(t1) use_merge(t2)*/ *2 FROM t1, t23 WHERE t1.id = t2.t1_id;
Statistics
----------------------------------------------------------1 recursive calls0 db block gets1012 consistent gets0 physical reads0 redo size13999 bytes sent via SQL*Net to client673 bytes received via SQL*Net from client8 SQL*Net roundtrips to/from client2 sorts (memory)0 sorts (disk)100 rows processedCREATE INDEX idx_t1_id ON t1(id);
CREATE INDEX idx_t2_t1_id ON t2(t1_id);
set autotrace traceonly
SQL> SELECT /*+ leading(t1) use_merge(t2)*/ *2 FROM t1, t23 WHERE t1.id = t2.t1_id;100 rows selected.Statistics
----------------------------------------------------------1 recursive calls0 db block gets1021 consistent gets0 physical reads0 redo size13999 bytes sent via SQL*Net to client673 bytes received via SQL*Net from client8 SQL*Net roundtrips to/from client1 sorts (memory)0 sorts (disk)100 rows processed