[mb-users] Language
Alexander Dupuy
dupuy at cs.columbia.edu
Wed Jun 1 17:35:35 UTC 2005
Orion <djkc at tds.net> writes:
> It is however confusing at times. Especially since title language isn't
> always the same as performance language and at least for me being able to
> search for songs performed in languages I know would be more useful than
> merely searching for songs titled in languages I know ^^
Well, of course, you can't actually search for albums with a particular
language, you can only search for the Add Album and Edit Language *moderations*.
While there are points in favor of either approach, the decision was made quite
some time back to treat the album-level language attributes as relating to
titles, in support of (future) internationalized display.
Björn Krombholz <fox.box at gmail.com> writes:
> In the long run I'd like to see an annotation collumn added to the
> track table. This would be nice for live tracks as well. I don't like
> the current proposal to put recording dates and places in the track
> name. Especially if you want to use the tagger to put this data in
> your id3 fields.
At some point, we should add track-level attributes that would handle
performance language, things like (live), (instrumental) etc. and similar sorts
of things. That feature was not added because track-level attributes are quite
a bit of additional work; adding language and script attributes to albums was
easier since there were already attributes for official, live, etc. and the new
attributes could be "piggy-backed" on the existing database fields.
> What would be the appropriate place to make such proposals/discuss such ideas?
Submit an RFE on the SourceForge tracker
http://sourceforge.net/tracker/?func=add&group_id=19506&atid=369506 and mention
it on this list, as you've done.
"Marco Sola" <marcosola at oksatcom.it> writes:
> So, following this, under ie Beethoven we will have the *same* work, let's say No. 5 under different languages if it's Simphony, Sinfonie, Sinfonia, and so on...
This is already the case for a lot of Asian artists, since there are
"translations" (or transliterations) of the titles of albums as not everyone
who listens to those artists can read Kanji.
In practice, I would usually use "Multiple languages" for classical releases,
as their titles often contain some combination of German/French/Italian/English.
Just because the guesser says Italian doesn't mean it's right; it's only a
*guess* after all...
> (*) PS: I told this once: browsing Bach, Mozart, Beethoven and other it has becomed a very, very painful matter for the client and for the server since thay have a page that weights over 1.5 Mb.
This is actually something that should be taken into consideration by the
people (Stefan & Matthias) implementing the ArtistPageRedesign. Beethoven is
probably the best stress test. I think that some of their changes will help
already, but perhaps there are other small improvements that would make a big
difference for the big classical names.
David Scotson <david.scotson at gmail.com> writes:
> Surely this is the point of having a language code for titles (rather
> than lyrics). Not so you know what language the song was originally
> named in, but so that you know which language the titles are currently
> written in. This is particularly useful for classical pieces or
> soundtracks where the descriptive titles ("Wedding March" or "Main
> Theme" "blah in A minor") are more likely to be translated than more
> 'poetic' names and they are often translated into various languages.
Exactly!
> Also, the use of italian language guessing to spot classical works
> reminds me that it would probably be very easy to identify classical
> works by passing their artist, album and track info into a standard
> bayesian spam filter and training it to recognize the classical terms.
> That might be an interesting project for whoever did the language
> guessing stuff and make life easier for those trying to get a handle
> on the classical stuff in MB.
The language and script-guessing code was built using the CPAN Language::Guess
perl module, and my impression is that it uses character distributions (and
maybe bigram [two-letter] distributions as well). So it probably wouldn't
really be practical for what you're describing. However, it's an interesting
idea. I bet that the same approach would work for some other genres, e.g.
rap's use of words like flava, jam, etc. However, using it for classical is
probably more useful, since we have distinct style guidelines for it, and the
Javascript Guess Case code could be put into "classical" mode if we detected
that it was probably such.
@alex
More information about the MusicBrainz-users
mailing list