Долго выполняется рекурсивный запрос в postgres
Зравствуйте, можно ли как то повысить перформанс текущего запроса? Возможно его надо разбить на несколько запросов с использованием темп таблиц.
У меня есть таблица с иерархией objectid, parentid со 150тыс-5миллионов записей и выполнение текущего запроса подвисает процесс,
WITH RECURSIVE data_selection AS (
select distinct(objectid), parentid, objectid root_id
from table_all t1
where parentid not in (select objectid from table_all)
UNION ALL
SELECT distinct(t2.objectid), t2.parentid, ds.root_id
FROM data_selection ds
JOIN table_all t2 ON ds.objectid = t2.parentid),
rootId_grouping AS (
SELECT '[' || ARRAY_TO_STRING(ARRAY_AGG(distinct(ds.objectid)), ',') || ']' as objectid, t1.parentid
FROM data_selection ds
left join table_all t1 on t1.objectid = root_id
GROUP BY ds.root_id , t1.parentid)
SELECT ARRAY_TO_STRING(ARRAY_AGG(objectid), ',') as objectid, parentid
FROM rootId_grouping group by parentid;
анализ выполнения запроса на 150тыс строк
HashAggregate (cost=293221048.40..293221051.40 rows=200 width=64) (actual time=448991.672..448992.575 rows=1081 loops=1)
Group Key: rootid_grouping.parentid
CTE data_selection
-> Recursive Union (cost=1000.44..174119782.29 rows=23923944 width=111) (actual time=531.542..447737.443 rows=9352 loops=1)
-> Unique (cost=1000.44..162028220.39 rows=11944 width=111) (actual time=531.540..446951.751 rows=1134 loops=1)
-> Gather Merge (cost=1000.44..162027921.50 rows=59778 width=111) (actual time=531.538..446928.089 rows=25727 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Index Only Scan using table_all_objectid_idx on table_all t1 (cost=0.42..162020021.61 rows=24908 width=74) (actual time=358.601..366268.250 rows=8576 loops=3)
Filter: (NOT (SubPlan 1))
Rows Removed by Filter: 31276
Heap Fetches: 0
SubPlan 1
-> Materialize (cost=0.00..6205.34 rows=119556 width=37) (actual time=0.003..6.446 rows=32894 loops=119556)
-> Seq Scan on table_all (cost=0.00..4672.56 rows=119556 width=37) (actual time=0.001..4.784 rows=32504 loops=94212)
-> Unique (cost=1116459.78..1161308.30 rows=2391200 width=106) (actual time=234.343..260.931 rows=2739 loops=3)
-> Sort (cost=1116459.78..1127671.91 rows=4484852 width=106) (actual time=233.456..254.409 rows=31276 loops=3)
Sort Key: t2.objectid, t2.parentid, ds.root_id
Sort Method: quicksort Memory: 25kB
-> Hash Join (cost=7685.01..99767.23 rows=4484852 width=106) (actual time=64.580..92.219 rows=31276 loops=3)
Hash Cond: ((ds.objectid)::text = (t2.parentid)::text)
-> WorkTable Scan on data_selection ds (cost=0.00..2388.80 rows=119440 width=64) (actual time=0.003..0.384 rows=3117 loops=3)
-> Hash (cost=4672.56..4672.56 rows=119556 width=74) (actual time=48.782..48.783 rows=119556 loops=3)
Buckets: 65536 Batches: 4 Memory Usage: 3530kB
-> Seq Scan on table_all t2 (cost=0.00..4672.56 rows=119556 width=74) (actual time=0.020..16.018 rows=119556 loops=3)
CTE rootid_grouping
-> GroupAggregate (cost=116005645.66..119085346.11 rows=636800 width=101) (actual time=448604.934..448988.259 rows=1134 loops=1)
Group Key: ds_1.root_id, t1_1.parentid
-> Sort (cost=116005645.66..116772386.77 rows=306696445 width=101) (actual time=448604.696..448759.035 rows=218440 loops=1)
Sort Key: ds_1.root_id, t1_1.parentid
Sort Method: external merge Disk: 25864kB
-> Hash Left Join (cost=7685.01..5683504.56 rows=306696445 width=101) (actual time=1139.896..447918.120 rows=218440 loops=1)
Hash Cond: ((ds_1.root_id)::text = (t1_1.objectid)::text)
-> CTE Scan on data_selection ds_1 (cost=0.00..478478.88 rows=23923944 width=64) (actual time=531.543..447742.091 rows=9352 loops=1)
-> Hash (cost=4672.56..4672.56 rows=119556 width=74) (actual time=57.091..57.092 rows=119556 loops=1)
Buckets: 65536 Batches: 4 Memory Usage: 3644kB
-> Seq Scan on table_all t1_1 (cost=0.00..4672.56 rows=119556 width=74) (actual time=0.026..18.613 rows=119556 loops=1)
-> CTE Scan on rootid_grouping (cost=0.00..12736.00 rows=636800 width=64) (actual time=448604.940..448989.470 rows=1134 loops=1)
Planning time: 2.741 ms
Execution time: 449001.698 ms
Пример данных таблицы table_all, индексы на objectid и parentid созданы
objectid | parentid | value |
---|---|---|
5d1eb711-a2fd-4ca5-beab-c8cddb37f2c8 | 3b82621b-6541-4c57-a0ff-42e0011afa95 | value |
039635e8-1831-4027-83bf-cdafb7cabe9f | 5d1eb711-a2fd-4ca5-beab-c8cddb37f2c8 | value |
55c7ef23-6a69-48f2-be80-6552ddf9e1f5 | 5d1eb711-a2fd-4ca5-beab-c8cddb37f2c8 | value |
b0e39676-73a4-45fd-a179-7140722cc9f8 | 55c7ef23-6a69-48f2-be80-6552ddf9e1f5 | value |
результат
objectid | parentid |
---|---|
[5d1eb711-a2fd-4ca5-beab-c8cddb37f2c8,039635e8-1831-4027-83bf-cdafb7cabe9f,55c7ef23-6a69-48f2-be80-6552ddf9e1f5,b0e39676-73a4-45fd-a179-7140722cc9f8] | 3b82621b-6541-4c57-a0ff-42e0011afa95 |
Возможно стоит воспользоваться дополнительным фреймворком. Пробовал spark, к сожалению не умеет он работать с рекурсивными запросами на сколько я знаю.