postgresql: разные скорости выполнения запроса при использовании JOIN
объясните пожалуйста с чем связано различие в скорости выполнения запросов (примерно в 10 раз):
быстрый запрос:
WITH selected_t AS (
SELECT
*
FROM
tbl_02
)
SELECT
COUNT(*) AS count, year AS value
FROM
selected_t
GROUP BY year;
медленный запрос:
WITH selected_t AS (
SELECT
*
FROM
tbl_01
LEFT JOIN tbl_02 USING (id)
)
SELECT
COUNT(*) AS count, year AS value
FROM
selected_t
GROUP BY year;
поле year
находится в таблице tbl_02
и индексировано
соединение таблиц происходит по одинаковому полю id
, которое индексировано в обоих таблицах
кол-во записей в таблицах одинаково, т.е. одинаковое поле id и значения одинаковые в id, а остальные поля разные
в таблицах по 40 миллионов записей
P.S.
понимаю, что задача немного экзотичная и избыточная, но это часть более сложного запроса в котором WHERE будет
P.P.S.
данные по EXPLAIN ANALYZE
:
быстрый запрос:
"Finalize GroupAggregate (cost=1000.50..367357.36 rows=4 width=12) (actual time=1027.565..1062.806 rows=4 loops=1)"
" Group Key: tbl2.year"
" -> Gather Merge (cost=1000.50..367357.24 rows=16 width=12) (actual time=1027.321..1062.769 rows=18 loops=1)"
" Workers Planned: 4"
" Workers Launched: 4"
" -> Partial GroupAggregate (cost=0.44..366355.28 rows=4 width=12) (actual time=64.494..806.830 rows=4 loops=5)"
" Group Key: tbl2.year"
" -> Parallel Index Only Scan using idx_tbl2_year on tbl2 (cost=0.44..314513.04 rows=10368440 width=4) (actual time=0.075..460.769 rows=8178446 loops=5)"
" Heap Fetches: 0"
"Planning Time: 0.167 ms"
"JIT:"
" Functions: 18"
" Options: Inlining false, Optimization false, Expressions true, Deforming true"
" Timing: Generation 3.901 ms, Inlining 0.000 ms, Optimization 1.187 ms, Emission 22.171 ms, Total 27.259 ms"
"Execution Time: 1064.528 ms"
медленный запрос:
"Finalize GroupAggregate (cost=1061734.21..1061736.25 rows=4 width=12) (actual time=8805.749..9342.205 rows=4 loops=1)"
" Group Key: tbl2.year"
" -> Gather Merge (cost=1061734.21..1061736.13 rows=16 width=12) (actual time=8805.742..9342.197 rows=20 loops=1)"
" Workers Planned: 4"
" Workers Launched: 4"
" -> Sort (cost=1060734.16..1060734.17 rows=4 width=12) (actual time=8690.508..8690.511 rows=4 loops=5)"
" Sort Key: tbl2.year"
" Sort Method: quicksort Memory: 25kB"
" Worker 0: Sort Method: quicksort Memory: 25kB"
" Worker 1: Sort Method: quicksort Memory: 25kB"
" Worker 2: Sort Method: quicksort Memory: 25kB"
" Worker 3: Sort Method: quicksort Memory: 25kB"
" -> Partial HashAggregate (cost=1060734.07..1060734.11 rows=4 width=12) (actual time=8690.449..8690.452 rows=4 loops=5)"
" Group Key: tbl2.year"
" Batches: 1 Memory Usage: 24kB"
" Worker 0: Batches: 1 Memory Usage: 24kB"
" Worker 1: Batches: 1 Memory Usage: 24kB"
" Worker 2: Batches: 1 Memory Usage: 24kB"
" Worker 3: Batches: 1 Memory Usage: 24kB"
" -> Parallel Hash Join (cost=516397.31..1009960.33 rows=10154748 width=4) (actual time=4424.311..8112.672 rows=8178446 loops=5)"
" Hash Cond: (tbl1.id = tbl2.id)"
" -> Parallel Index Only Scan using idx_tbl1_id on tbl1 (cost=0.56..315855.51 rows=10154748 width=8) (actual time=0.066..651.861 rows=8178446 loops=5)"
" Heap Fetches: 0"
" -> Parallel Hash (cost=386791.24..386791.24 rows=10368440 width=12) (actual time=4375.725..4375.726 rows=8178446 loops=5)"
" Buckets: 67108864 Batches: 1 Memory Usage: 2443072kB"
" -> Parallel Index Scan using idx_tbl2_year on tbl2 (cost=0.44..386791.24 rows=10368440 width=12) (actual time=107.607..1968.085 rows=8178446 loops=5)"
"Planning Time: 0.576 ms"
"JIT:"
" Functions: 68"
" Options: Inlining true, Optimization true, Expressions true, Deforming true"
" Timing: Generation 17.020 ms, Inlining 265.814 ms, Optimization 185.610 ms, Emission 84.950 ms, Total 553.394 ms"
"Execution Time: 9345.371 ms"
Источник: Stack Overflow на русском