[mb-devel] [mb-users] Search Enhancement proposal - Implementation specifics - GSoC 2010

Dhiraj Lohiya lohiya.dhiraj at gmail.com
Wed Mar 31 19:24:40 UTC 2010


Please find the replies inline:

> On Wed, Mar 31, 2010 at 9:58 PM, Dhiraj Lohiya
>> > <lohiya.dhiraj at gmail.com <mailto:lohiya.dhiraj at gmail.com>> wrote:
>> >
>> >     I know both Hindi and Marathi well so I could do for both of them.
>> >
>> >     Apart from that, the following languages are nicely supported in
>> >     the algorithm
>> >     1. Germanic
>> >     2. Celtic
>> >     3. Greek
>> >     4. French
>> >     5. Italian
>> >     6. Spanish
>> >     7. Chinese
>> >
>> >     So I would also want to provide support on the aforementioned
>> >     languages and want to design it in a modular fashion so that new
>> >     languages could be easily integrated. Either case, we would have a
>> >     testing phase within our community and the languages which cross
>> >     the quality threshold would only be up finally.
>> >
>> >     I am proposing this for the above languages since because of the
>> >     difficulties faced in Hindi, I guess similar problems might be
>> >     faced by other users of different languages.
>> >     Even in case of English, probably native English speakers are
>> >     pretty comfortable but at least me and some of my friends feel
>> >     that this feature would be helpful to many non-native English
>> >     speakers. So we could still provide English as an option in the
>> >     drop-down list and give the user the choice to select or not.
>> >
>> You have still skirted round the issue of how we index this new data,
>>
>
I mentioned phonetix plugin and double metaphone and there integration with
Lucene, so I would like to know more specifically what more details you are
asking for?


> also would you expect this new option to be available from the webservice ?
>
>

Probably, after reading the other comments as well, I feel that we should
1st check out the performance on the website with some test data. If this
works well, we could do this for the web service as well later on, but for
now, I would leave this.


>
> >
>> >     I would like a final discussion on whether I should go ahead with
>> >     a bunch of 10 languages as mentioned above or only Hindi and
>> Marathi.
>> >
>> The discussion is firstly whether you should go ahead at all, but if you
>> do I definitently think you should go for quality rather than quantity,
>> hence do the work for Hindi, and if that all works okay then other
>> lanaguages can be considered at that point there is no point working on
>> multipel lanaguages only to find the solution in general does not work
>> as desired.
>>
>
Ok agreed. I would initially design for Hindi and Marathi (since I am
comfortable with both) and check out for the performance before expanding
this to other languages.



> >
>> >
>> >     What about typos and transliteration? "Use advanced query syntax"
>> >     provides support for typos
>> >
>> You are hijacking this term 'Advanced Query syntax' means the search is
>> taken as a Lucene syntax search rather than just a literal search
>>
>
 I have mentioned it in the same context of Lucene and the present
implementation itself on the site (and not phonetic feature proposed)
wherein after clicking the checkbox for "Advanced Query Search", custom
lucene queries could be searched for. For example, if I wanted to search for
"Delhi", but I mistype it as "Dekhi" (l->k, k and l being close enough on
the keyboard), we could infact do a fuzzy search of dekhi~ and still get
delhi as result, as in the link below:
http://musicbrainz.org/search/textsearch.html?query=dekhi~&type=track&limit=25&adv=on&handlearguments=1


-- 
Regards
Dhiraj Lohiya
IRC nick: Dj
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.musicbrainz.org/pipermail/musicbrainz-devel/attachments/20100401/67a2b7fc/attachment.htm 


More information about the MusicBrainz-devel mailing list