[mb-users] Language

Alexander Dupuy dupuy at cs.columbia.edu
Wed Jun 1 17:35:35 UTC 2005


Orion <djkc at tds.net> writes:
> It is however confusing at times.  Especially since title language isn't
> always the same as performance language and at least for me being able to
> search for songs performed in languages I know would be more useful than
> merely searching for songs titled in languages I know ^^

Well, of course, you can't actually search for albums with a particular 
language, you can only search for the Add Album and Edit Language *moderations*.

While there are points in favor of either approach, the decision was made quite 
some time back to treat the album-level language attributes as relating to 
titles, in support of (future) internationalized display.

Björn Krombholz <fox.box at gmail.com> writes:
> In the long run I'd like to see an annotation collumn added to the
> track table. This would be nice for live tracks as well. I don't like
> the current proposal to put recording dates and places in the track
> name. Especially if you want to use the tagger to put this data in
> your id3 fields.

At some point, we should add track-level attributes that would handle 
performance language, things like (live), (instrumental) etc. and similar sorts 
of things.  That feature was not added because track-level attributes are quite 
a bit of additional work; adding language and script attributes to albums was 
easier since there were already attributes for official, live, etc. and the new 
attributes could be "piggy-backed" on the existing database fields.

> What would be the appropriate place to make such proposals/discuss such ideas?

Submit an RFE on the SourceForge tracker 
http://sourceforge.net/tracker/?func=add&group_id=19506&atid=369506 and mention 
it on this list, as you've done.

"Marco Sola" <marcosola at oksatcom.it> writes:
> So, following this, under ie Beethoven we will have the *same* work, let's say No. 5 under different languages if it's Simphony, Sinfonie, Sinfonia, and so on...

This is already the case for a lot of Asian artists, since there are 
"translations" (or transliterations) of the titles of albums as not everyone 
who listens to those artists can read Kanji.

In practice, I would usually use "Multiple languages" for classical releases, 
as their titles often contain some combination of German/French/Italian/English.

Just because the guesser says Italian doesn't mean it's right; it's only a 
*guess* after all...

> (*) PS: I told this once: browsing Bach, Mozart, Beethoven and other it has becomed a very, very painful matter for the client and for the server since thay have a page that weights over 1.5 Mb.

This is actually something that should be taken into consideration by the 
people (Stefan & Matthias) implementing the ArtistPageRedesign.  Beethoven is 
probably the best stress test.  I think that some of their changes will help 
already, but perhaps there are other small improvements that would make a big 
difference for the big classical names.

David Scotson <david.scotson at gmail.com> writes:
> Surely this is the point of having a language code for titles (rather
> than lyrics). Not so you know what language the song was originally
> named in, but so that you know which language the titles are currently
> written in. This is particularly useful for classical pieces or
> soundtracks where the descriptive titles ("Wedding March" or "Main
> Theme" "blah in A minor") are more likely to be translated than more
> 'poetic' names and they are often translated into various languages.

Exactly!

> Also, the use of italian language guessing to spot classical works
> reminds me that it would probably be very easy to identify classical
> works by passing their artist, album and track info into a standard
> bayesian spam filter and training it to recognize the classical terms.
> That might be an interesting project for whoever did the language
> guessing stuff and make life easier for those trying to get a handle
> on the classical stuff in MB.

The language and script-guessing code was built using the CPAN Language::Guess 
perl module, and my impression is that it uses character distributions (and 
maybe bigram [two-letter] distributions as well).  So it probably wouldn't 
really be practical for what you're describing.  However, it's an interesting 
idea.  I bet that the same approach would work for some other genres, e.g. 
rap's use of words like flava, jam, etc.  However, using it for classical is 
probably more useful, since we have distinct style guidelines for it, and the 
Javascript Guess Case code could be put into "classical" mode if we detected 
that it was probably such.

@alex






More information about the MusicBrainz-users mailing list