[mb-style] RFV: Dealing with translations and
transliterations
Bogdan Butnaru
bogdanb at gmail.com
Wed Jul 19 14:50:11 UTC 2006
On 7/19/06, Oleg Rowaa[SR13] V. Volkov <mb.rowaasr13 at gmail.com> wrote:
> Greetings.
>
> Tuesday, July 18, 2006, 2:13:26 PM Bogdan. wrote:
> > I wanted, though, to chime in on the unicode thing: I think we should
> > make a very clear guideline against such v.r.'s: while transliteration
> > from cyrillic/kanji/&c to latin is hard and requires human attention
> > (AFAIK), Unicode to ASCII should be easy to implement in the tagger,
> > as an option.
>
> How would you auto-convert, say, "♥"? Is that "love", as in "I♥U", is that "heart" or, say, it is German release and is it "Herz". There are many cases while 1:1 conversion table wouldn't work. Actually even converting Cyrillic automatically looks easier to me than this.
You don't. You only convert what you can from the Latin part, removing
accents, splitting ligatures and stuff like that. Some of the
punctuation can be simplified too (ellipsis, dashes, quotes). The rest
you either (1) replace with "?", (2) replace with a character of the
user's choice or (3) ask the user.
Remember, this is not a real transliteration, it's an approximation.
You just remove "details" from something mostly latin. For
transliteration you generally need to know the language (not just the
alphabeth), because almost always there is no single way to transcribe
a character, it varies with context.
We probably don't want to do this with virtual releases, because the
purpose is getting something that fits in the 8-bit charset the user
has; this is going to be different for someone who uses English
Windows and French, etc. We'd end up having virtual releases for many
different target charsets, which sucks. Whoever wants to do this isn't
interested in "precise"; if he really needs it, he can choose the
option to be asked for weird characters.
(BTW: there _might_ be non-latin scripts that can be automatically
transliterated to US-ASCII, but those are special cases.)
-- Bogdan Butnaru — bogdanb at gmail.com
"I think I am a fallen star, I should wish on myself." – O.
More information about the Musicbrainz-style
mailing list