[mb-devel] optimizing a database query

yary hluchan not.com at gmail.com
Thu Jan 20 23:43:32 UTC 2005


Hi, I'm new here. I like databases, and right now, I'm looking into
speeding up the "find my votes"query-

SELECT m.*, NOW() > m.expiretime AS expired, COALESCE(v.vote, ?) AS vote
FROM moderation_all m INNER JOIN vote_all v ON
       v.moderation = m.id AND v.moderator = ?
       AND NOT v.superseded
ORDER BY m.id DESC

Variables in order of appearance:
       1. &ModDefs::VOTE_NOTVOTED == -2
       2. Moderator Id

This is going to be a long post, showing my steps along the way. You
can scroll to the end for conclusions. The attachment shows query
plans, in case you want to follow along.

I started by finding moderators with a few votes. I would expect that
postgres would return their list of votes quickly if it were using
indexes properly.

musicbrainz_db=> select moderator, count(*) from vote_all group by moderator
musicbrainz_db-> having count(*) between 80 and 500;

 moderator | count
-----------+-------
         1 |   108
        13 |   210
        15 |   200
        38 |   117
        43 |   461
.....
     80230 |    82
     80258 |   294
     80525 |   152
     80691 |   212
     81061 |   263
     81265 |   101
     81279 |   451
     81867 |   207
     82520 |   155
     82641 |   224
     82725 |   107
     82769 |   232
     83025 |   103
     83068 |    84
     83712 |   254
     83713 |    92
     84014 |   128
     85474 |    95
     85981 |    91
(575 rows)

Let's time how long it takes to run the query for moderator 85981.

time psql  -U musicbrainz_user musicbrainz_db <<EOT
> SELECT m.*, NOW() > m.expiretime AS expired, COALESCE(v.vote, -2) AS vote
> FROM moderation_all m INNER JOIN vote_all v ON
>        v.moderation = m.id AND v.moderator = 85981
>        AND NOT v.superseded
> ORDER BY m.id DESC
> EOT
...
(91 rows)
real    10m7.390s

My goal is to reduce that 10m wall-clock time (on my system) to
something more like 2 seconds. Let's start by looking at the execution
plan, 1 in the attachment. The optimizer is using the moderator
indexes on the vote tables, but not on the moderation tables. It's
scanning the full moderation_open and moderation_closed tables. Ouch!

There are indexes on moderator(id), why isn't the optimizer using
them? Starting with the quick approach-
musicbrainz_db=> ANALYZE moderation_open;
ANALYZE
musicbrainz_db=> ANALYZE moderation_closed;
ANALYZE

gives a new query plan (2) which is pretty much the same as the first.
Analyzing vote_open and vote_close makes no difference. The database
was vacuumed on creation anyway, so the analyze step is probably
redundant now.

Let's check that the indexes on moderation are actually used in a simple case-

explain SELECT m.* FROM moderation_all m where m.id = 85981;
they are showing up in qp3, good

SET ENABLE_SEQSCAN TO false; doesn't force the index usage.

Is the view confusing the optimizer? Let's rewrite the query not using
the views. There's a foreign key constraint limiting vote_open's
moderations to the moderation_open table, I am assuming that the
closed votes only apply to closed moderations. Actually let's check
that assumption-
musicbrainz_db=> select count(*) from moderation_open m INNER JOIN vote_closed v
musicbrainz_db->  on v.moderation = m.id;
 count
-------
     0
(1 row)

Now for a rewrite- replace "_all" with "_open" and "_closed", paste
the two versions together with a "union all", and drop the "m."
qualifier from the order by clause:
SELECT m.*, NOW() > m.expiretime AS expired, COALESCE(v.vote, -2) AS vote
FROM moderation_open m INNER JOIN vote_open v ON
       v.moderation = m.id AND v.moderator = 85981
       AND NOT v.superseded
union all
SELECT m.*, NOW() > m.expiretime AS expired, COALESCE(v.vote, -2) AS vote
FROM moderation_closed m INNER JOIN vote_closed v ON
       v.moderation = m.id AND v.moderator = 85981
       AND NOT v.superseded
ORDER BY id DESC

that gives a much better query plan, QP 4. Let's time the rewrite
...
(91 rows)
real    0m1.632s


alright! I even hit my target, thought I'd have to go back and change it :-)

* Recommendation: when building a query that searches through
moderations, use the base tables and not the "moderation_all" view.

Feasible? I haven't looked through the code, and that's not in my
execution plan....

Any other queries I should look at? One user suggested:
... 'View track mods' (e.g.
http://www.musicbrainz.org/mod/search/pre/track.html?trackid=1342997)?
if someone posts the sql I can examine.
-------------- next part --------------
explain
SELECT m.*, NOW() > m.expiretime AS expired, COALESCE(v.vote, -2) AS vote
FROM moderation_all m INNER JOIN vote_all v ON
       v.moderation = m.id AND v.moderator = 85981 
       AND NOT v.superseded
ORDER BY m.id DESC

QUery plan 1:
 Sort  (cost=17414434.67..17466704.34 rows=20907867 width=192)
   Sort Key: m.id
   ->  Merge Join  (cost=831341.18..1100244.95 rows=20907867 width=192)
         Merge Cond: ("outer".moderation = "inner".id)
         ->  Sort  (cost=4238.05..4244.98 rows=2772 width=23)
               Sort Key: v.moderation
               ->  Subquery Scan v  (cost=0.00..4079.51 rows=2772 width=23)
                     ->  Append  (cost=0.00..4079.51 rows=2772 width=23)
                           ->  Subquery Scan "*SELECT* 1"  (cost=0.00..78.44 rows=21 width=23)
                                 ->  Index Scan using vote_open_idx_moderator on vote_open  (cost=0.00..78.44 rows=21 width=23)
                                       Index Cond: (moderator = 85981)
                                       Filter: (NOT superseded)
                           ->  Subquery Scan "*SELECT* 2"  (cost=0.00..4001.07 rows=2752 width=23)
                                 ->  Index Scan using vote_closed_idx_moderator on vote_closed  (cost=0.00..4001.07 rows=2752 width=23)
                                       Index Cond: (moderator = 85981)
                                       Filter: (NOT superseded)
         ->  Sort  (cost=2257553.13..2261323.92 rows=1508314 width=524)
               Sort Key: m.id
               ->  Subquery Scan m  (cost=0.00..92806.14 rows=1508314 width=524)
                     ->  Append  (cost=0.00..92806.14 rows=1508314 width=524)
                           ->  Subquery Scan "*SELECT* 1"  (cost=0.00..466.47 rows=6047 width=465)
                                 ->  Seq Scan on moderation_open  (cost=0.00..466.47 rows=6047 width=465)
                           ->  Subquery Scan "*SELECT* 2"  (cost=0.00..92339.67 rows=1502267 width=524)
                                 ->  Seq Scan on moderation_closed  (cost=0.00..92339.67 rows=1502267 width=524)
(24 rows)



Query plan 2:
 Sort  (cost=20116019.29..20176348.04 rows=24131499 width=192)
   Sort Key: m.id
   ->  Merge Join  (cost=949602.91..1259964.86 rows=24131499 width=192)
         Merge Cond: ("outer".id = "inner".moderation)
         ->  Sort  (cost=2545684.86..2550037.04 rows=1740870 width=509)
               Sort Key: m.id
               ->  Subquery Scan m  (cost=0.00..95131.70 rows=1740870 width=509)
                     ->  Append  (cost=0.00..95131.70 rows=1740870 width=509)
                           ->  Subquery Scan "*SELECT* 1"  (cost=0.00..473.60 rows=6760 width=466)
                                 ->  Seq Scan on moderation_open  (cost=0.00..473.60 rows=6760 width=466)
                           ->  Subquery Scan "*SELECT* 2"  (cost=0.00..94658.10 rows=1734110 width=509)
                                 ->  Seq Scan on moderation_closed  (cost=0.00..94658.10 rows=1734110 width=509)
         ->  Sort  (cost=4238.05..4244.98 rows=2772 width=23)
               Sort Key: v.moderation
               ->  Subquery Scan v  (cost=0.00..4079.51 rows=2772 width=23)
                     ->  Append  (cost=0.00..4079.51 rows=2772 width=23)
                           ->  Subquery Scan "*SELECT* 1"  (cost=0.00..78.44 rows=21 width=23)
                                 ->  Index Scan using vote_open_idx_moderator on vote_open  (cost=0.00..78.44 rows=21 width=23)
                                       Index Cond: (moderator = 85981)
                                       Filter: (NOT superseded)
                           ->  Subquery Scan "*SELECT* 2"  (cost=0.00..4001.07 rows=2752 width=23)
                                 ->  Index Scan using vote_closed_idx_moderator on vote_closed  (cost=0.00..4001.07 rows=2752 width=23)
                                       Index Cond: (moderator = 85981)
                                       Filter: (NOT superseded)
(24 rows)



test qp 3-
musicbrainz_db=> explain SELECT m.* FROM moderation_all m where m.id = 2045296;
                                                     QUERY PLAN
 Subquery Scan m  (cost=0.00..9.37 rows=2 width=509)
   ->  Append  (cost=0.00..9.37 rows=2 width=509)
         ->  Subquery Scan "*SELECT* 1"  (cost=0.00..6.01 rows=1 width=466)
               ->  Index Scan using moderation_open_pkey on moderation_open  (cost=0.00..6.01 rows=1 width=466)
                     Index Cond: (id = 2045296)
         ->  Subquery Scan "*SELECT* 2"  (cost=0.00..3.36 rows=1 width=509)
               ->  Index Scan using moderation_closed_pkey on moderation_closed  (cost=0.00..3.36 rows=1 width=509)
                     Index Cond: (id = 2045296)
(8 rows)




rewrite-
explain
SELECT m.*, NOW() > m.expiretime AS expired, COALESCE(v.vote, -2) AS vote
FROM moderation_open m INNER JOIN vote_open v ON
       v.moderation = m.id AND v.moderator = 85981
       AND NOT v.superseded
union all
SELECT m.*, NOW() > m.expiretime AS expired, COALESCE(v.vote, -2) AS vote
FROM moderation_closed m INNER JOIN vote_closed v ON
       v.moderation = m.id AND v.moderator = 85981
       AND NOT v.superseded
ORDER BY id DESC

qp 4-
 Sort  (cost=14214.37..14221.65 rows=2914 width=515)
   Sort Key: id
   ->  Append  (cost=0.00..13653.46 rows=2914 width=515)
         ->  Subquery Scan "*SELECT* 1"  (cost=0.00..195.56 rows=20 width=472)
               ->  Nested Loop  (cost=0.00..195.56 rows=20 width=472)
                     ->  Index Scan using vote_open_idx_moderator on vote_open v  (cost=0.00..76.06 rows=20 width=6)
                           Index Cond: (moderator = 85981)
                           Filter: (NOT superseded)
                     ->  Index Scan using moderation_open_pkey on moderation_open m  (cost=0.00..6.01 rows=1 width=466)
                           Index Cond: ("outer".moderation = m.id)
         ->  Subquery Scan "*SELECT* 2"  (cost=0.00..13457.90 rows=2895 width=515)
               ->  Nested Loop  (cost=0.00..13457.90 rows=2895 width=515)
                     ->  Index Scan using vote_closed_idx_moderator on vote_closed v  (cost=0.00..3688.50 rows=2895 width=6)
                           Index Cond: (moderator = 85981)
                           Filter: (NOT superseded)
                     ->  Index Scan using moderation_closed_pkey on moderation_closed m  (cost=0.00..3.36 rows=1 width=509)
                           Index Cond: ("outer".moderation = m.id)
(17 rows)




More information about the MusicBrainz-devel mailing list