[mb-bugs] [jira] Commented: (SEARCH-55) Improve speed of building the Recording Index

Paul Taylor (JIRA) jira-admin at musicbrainz.org
Fri Jan 21 15:47:36 UTC 2011


    [ http://jira.musicbrainz.org/browse/SEARCH-55?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=11969#action_11969 ] 

Paul Taylor commented on SEARCH-55:
-----------------------------------

What is most important is the time it takes to build the index as a whole rather then the individual indexes, some more ideas (some from Lucene in Action 2)
1. Whilst temporary tables are being created lucene is doing just waiting for the database results, so we should build these tables in the background and start building the indexes that do not rely on them.
2.Currently MergeDocuments is set very large (larger than the number of segments actually created ) so that optimization is only done at the end in the main thread, set it low the optimization can be done ion another thread in the background as the indexes are built so there is less to do at the end.
3. Currently we decide when flush docs to a directory by using setMaxBufferedDocs() using setRAMBUfferSizeMB() is recommened instead for better performance, and we should increase the value from the default as we can safetly assume that the indexer is run with at least 512 MB and it doesn't make use of all this.


Recording Index Specific:
1. Currently we construct each batch query so that there is no duplicate, i.e. because one recording can be used in two tracks we have a recording query and a track query but mya be quicker to merge some of these queries, and just adjust processing accordingly.

> Improve speed of building the Recording Index
> ---------------------------------------------
>
>                 Key: SEARCH-55
>                 URL: http://jira.musicbrainz.org/browse/SEARCH-55
>             Project: MusicBrainz Search Server
>          Issue Type: Improvement
>    Affects Versions: NGS - RC1
>            Reporter: Paul Taylor
>            Assignee: Paul Taylor
>
> The recording table is the largest table in the database, unsuprinsingly it takes significantly longer to build this index than any other. Look at use of temporary tables and other ideas to improve this.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://jira.musicbrainz.org/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        



More information about the MusicBrainz-bugs mailing list