[mb-devel] optimizing a database query
yary hluchan
not.com at gmail.com
Thu Jan 20 23:43:32 UTC 2005
Hi, I'm new here. I like databases, and right now, I'm looking into
speeding up the "find my votes"query-
SELECT m.*, NOW() > m.expiretime AS expired, COALESCE(v.vote, ?) AS vote
FROM moderation_all m INNER JOIN vote_all v ON
v.moderation = m.id AND v.moderator = ?
AND NOT v.superseded
ORDER BY m.id DESC
Variables in order of appearance:
1. &ModDefs::VOTE_NOTVOTED == -2
2. Moderator Id
This is going to be a long post, showing my steps along the way. You
can scroll to the end for conclusions. The attachment shows query
plans, in case you want to follow along.
I started by finding moderators with a few votes. I would expect that
postgres would return their list of votes quickly if it were using
indexes properly.
musicbrainz_db=> select moderator, count(*) from vote_all group by moderator
musicbrainz_db-> having count(*) between 80 and 500;
moderator | count
-----------+-------
1 | 108
13 | 210
15 | 200
38 | 117
43 | 461
.....
80230 | 82
80258 | 294
80525 | 152
80691 | 212
81061 | 263
81265 | 101
81279 | 451
81867 | 207
82520 | 155
82641 | 224
82725 | 107
82769 | 232
83025 | 103
83068 | 84
83712 | 254
83713 | 92
84014 | 128
85474 | 95
85981 | 91
(575 rows)
Let's time how long it takes to run the query for moderator 85981.
time psql -U musicbrainz_user musicbrainz_db <<EOT
> SELECT m.*, NOW() > m.expiretime AS expired, COALESCE(v.vote, -2) AS vote
> FROM moderation_all m INNER JOIN vote_all v ON
> v.moderation = m.id AND v.moderator = 85981
> AND NOT v.superseded
> ORDER BY m.id DESC
> EOT
...
(91 rows)
real 10m7.390s
My goal is to reduce that 10m wall-clock time (on my system) to
something more like 2 seconds. Let's start by looking at the execution
plan, 1 in the attachment. The optimizer is using the moderator
indexes on the vote tables, but not on the moderation tables. It's
scanning the full moderation_open and moderation_closed tables. Ouch!
There are indexes on moderator(id), why isn't the optimizer using
them? Starting with the quick approach-
musicbrainz_db=> ANALYZE moderation_open;
ANALYZE
musicbrainz_db=> ANALYZE moderation_closed;
ANALYZE
gives a new query plan (2) which is pretty much the same as the first.
Analyzing vote_open and vote_close makes no difference. The database
was vacuumed on creation anyway, so the analyze step is probably
redundant now.
Let's check that the indexes on moderation are actually used in a simple case-
explain SELECT m.* FROM moderation_all m where m.id = 85981;
they are showing up in qp3, good
SET ENABLE_SEQSCAN TO false; doesn't force the index usage.
Is the view confusing the optimizer? Let's rewrite the query not using
the views. There's a foreign key constraint limiting vote_open's
moderations to the moderation_open table, I am assuming that the
closed votes only apply to closed moderations. Actually let's check
that assumption-
musicbrainz_db=> select count(*) from moderation_open m INNER JOIN vote_closed v
musicbrainz_db-> on v.moderation = m.id;
count
-------
0
(1 row)
Now for a rewrite- replace "_all" with "_open" and "_closed", paste
the two versions together with a "union all", and drop the "m."
qualifier from the order by clause:
SELECT m.*, NOW() > m.expiretime AS expired, COALESCE(v.vote, -2) AS vote
FROM moderation_open m INNER JOIN vote_open v ON
v.moderation = m.id AND v.moderator = 85981
AND NOT v.superseded
union all
SELECT m.*, NOW() > m.expiretime AS expired, COALESCE(v.vote, -2) AS vote
FROM moderation_closed m INNER JOIN vote_closed v ON
v.moderation = m.id AND v.moderator = 85981
AND NOT v.superseded
ORDER BY id DESC
that gives a much better query plan, QP 4. Let's time the rewrite
...
(91 rows)
real 0m1.632s
alright! I even hit my target, thought I'd have to go back and change it :-)
* Recommendation: when building a query that searches through
moderations, use the base tables and not the "moderation_all" view.
Feasible? I haven't looked through the code, and that's not in my
execution plan....
Any other queries I should look at? One user suggested:
... 'View track mods' (e.g.
http://www.musicbrainz.org/mod/search/pre/track.html?trackid=1342997)?
if someone posts the sql I can examine.
-------------- next part --------------
explain
SELECT m.*, NOW() > m.expiretime AS expired, COALESCE(v.vote, -2) AS vote
FROM moderation_all m INNER JOIN vote_all v ON
v.moderation = m.id AND v.moderator = 85981
AND NOT v.superseded
ORDER BY m.id DESC
QUery plan 1:
Sort (cost=17414434.67..17466704.34 rows=20907867 width=192)
Sort Key: m.id
-> Merge Join (cost=831341.18..1100244.95 rows=20907867 width=192)
Merge Cond: ("outer".moderation = "inner".id)
-> Sort (cost=4238.05..4244.98 rows=2772 width=23)
Sort Key: v.moderation
-> Subquery Scan v (cost=0.00..4079.51 rows=2772 width=23)
-> Append (cost=0.00..4079.51 rows=2772 width=23)
-> Subquery Scan "*SELECT* 1" (cost=0.00..78.44 rows=21 width=23)
-> Index Scan using vote_open_idx_moderator on vote_open (cost=0.00..78.44 rows=21 width=23)
Index Cond: (moderator = 85981)
Filter: (NOT superseded)
-> Subquery Scan "*SELECT* 2" (cost=0.00..4001.07 rows=2752 width=23)
-> Index Scan using vote_closed_idx_moderator on vote_closed (cost=0.00..4001.07 rows=2752 width=23)
Index Cond: (moderator = 85981)
Filter: (NOT superseded)
-> Sort (cost=2257553.13..2261323.92 rows=1508314 width=524)
Sort Key: m.id
-> Subquery Scan m (cost=0.00..92806.14 rows=1508314 width=524)
-> Append (cost=0.00..92806.14 rows=1508314 width=524)
-> Subquery Scan "*SELECT* 1" (cost=0.00..466.47 rows=6047 width=465)
-> Seq Scan on moderation_open (cost=0.00..466.47 rows=6047 width=465)
-> Subquery Scan "*SELECT* 2" (cost=0.00..92339.67 rows=1502267 width=524)
-> Seq Scan on moderation_closed (cost=0.00..92339.67 rows=1502267 width=524)
(24 rows)
Query plan 2:
Sort (cost=20116019.29..20176348.04 rows=24131499 width=192)
Sort Key: m.id
-> Merge Join (cost=949602.91..1259964.86 rows=24131499 width=192)
Merge Cond: ("outer".id = "inner".moderation)
-> Sort (cost=2545684.86..2550037.04 rows=1740870 width=509)
Sort Key: m.id
-> Subquery Scan m (cost=0.00..95131.70 rows=1740870 width=509)
-> Append (cost=0.00..95131.70 rows=1740870 width=509)
-> Subquery Scan "*SELECT* 1" (cost=0.00..473.60 rows=6760 width=466)
-> Seq Scan on moderation_open (cost=0.00..473.60 rows=6760 width=466)
-> Subquery Scan "*SELECT* 2" (cost=0.00..94658.10 rows=1734110 width=509)
-> Seq Scan on moderation_closed (cost=0.00..94658.10 rows=1734110 width=509)
-> Sort (cost=4238.05..4244.98 rows=2772 width=23)
Sort Key: v.moderation
-> Subquery Scan v (cost=0.00..4079.51 rows=2772 width=23)
-> Append (cost=0.00..4079.51 rows=2772 width=23)
-> Subquery Scan "*SELECT* 1" (cost=0.00..78.44 rows=21 width=23)
-> Index Scan using vote_open_idx_moderator on vote_open (cost=0.00..78.44 rows=21 width=23)
Index Cond: (moderator = 85981)
Filter: (NOT superseded)
-> Subquery Scan "*SELECT* 2" (cost=0.00..4001.07 rows=2752 width=23)
-> Index Scan using vote_closed_idx_moderator on vote_closed (cost=0.00..4001.07 rows=2752 width=23)
Index Cond: (moderator = 85981)
Filter: (NOT superseded)
(24 rows)
test qp 3-
musicbrainz_db=> explain SELECT m.* FROM moderation_all m where m.id = 2045296;
QUERY PLAN
Subquery Scan m (cost=0.00..9.37 rows=2 width=509)
-> Append (cost=0.00..9.37 rows=2 width=509)
-> Subquery Scan "*SELECT* 1" (cost=0.00..6.01 rows=1 width=466)
-> Index Scan using moderation_open_pkey on moderation_open (cost=0.00..6.01 rows=1 width=466)
Index Cond: (id = 2045296)
-> Subquery Scan "*SELECT* 2" (cost=0.00..3.36 rows=1 width=509)
-> Index Scan using moderation_closed_pkey on moderation_closed (cost=0.00..3.36 rows=1 width=509)
Index Cond: (id = 2045296)
(8 rows)
rewrite-
explain
SELECT m.*, NOW() > m.expiretime AS expired, COALESCE(v.vote, -2) AS vote
FROM moderation_open m INNER JOIN vote_open v ON
v.moderation = m.id AND v.moderator = 85981
AND NOT v.superseded
union all
SELECT m.*, NOW() > m.expiretime AS expired, COALESCE(v.vote, -2) AS vote
FROM moderation_closed m INNER JOIN vote_closed v ON
v.moderation = m.id AND v.moderator = 85981
AND NOT v.superseded
ORDER BY id DESC
qp 4-
Sort (cost=14214.37..14221.65 rows=2914 width=515)
Sort Key: id
-> Append (cost=0.00..13653.46 rows=2914 width=515)
-> Subquery Scan "*SELECT* 1" (cost=0.00..195.56 rows=20 width=472)
-> Nested Loop (cost=0.00..195.56 rows=20 width=472)
-> Index Scan using vote_open_idx_moderator on vote_open v (cost=0.00..76.06 rows=20 width=6)
Index Cond: (moderator = 85981)
Filter: (NOT superseded)
-> Index Scan using moderation_open_pkey on moderation_open m (cost=0.00..6.01 rows=1 width=466)
Index Cond: ("outer".moderation = m.id)
-> Subquery Scan "*SELECT* 2" (cost=0.00..13457.90 rows=2895 width=515)
-> Nested Loop (cost=0.00..13457.90 rows=2895 width=515)
-> Index Scan using vote_closed_idx_moderator on vote_closed v (cost=0.00..3688.50 rows=2895 width=6)
Index Cond: (moderator = 85981)
Filter: (NOT superseded)
-> Index Scan using moderation_closed_pkey on moderation_closed m (cost=0.00..3.36 rows=1 width=509)
Index Cond: ("outer".moderation = m.id)
(17 rows)
More information about the MusicBrainz-devel
mailing list