[mb-users] Re: Re: Clean vs. Explicit versions

Frederic Da Vitoria davitofrg at gmail.com
Sun Oct 28 17:54:40 UTC 2007


2007/10/28, Wendell Hicken <whicken at gmail.com>:
>
> On 10/28/07, Frederic Da Vitoria <davitofrg at gmail.com> wrote:
> >
> > It seems all of us agree, except for the possibility of the problem.
> > Actually, if Chad is right, we should maybe remove the "U" from PUID, since
> > it would mean the IDs are not Unique!
> >
> > Brian, do you actually KNOW it is impossible, or do you only believe it
> > can not happen? AFAIK, the PUIDs could be calculated by taking samples at
> > regular intervals (say at 5 seconds intervals), which could miss the bleeps
> > sometimes. (I did not say they are calculated this way, just that I never
> > saw any indication this was not the case). Here
> > http://blog.musicbrainz.org/archives/2006/03/finger_fingerpr.html , it
> > is explicitly said: "It should have a lot fewer duplicates and collisions
> > than TRM." "less" at least means collisions can happen :-(
> >
> > Chad (or someone else), do you have or can you put your hands on such an
> > occurrence (same track lengths, same PUIDs but different audible content)?
> > So that we stop using "if"s! And we can start discussing what we should do
> > WHEN it happens.
> >
>
> Just some quick notes on the fingerprinting in general - fingerprinting is
> necessarily a fuzzy process. Otherwise you don't get the desirable effect of
> being insensitive to encoding issues and assorted "minor" differences in the
> acoustics.  What two different people consider "minor" is of course open to
> considerable interpretation.
>
> In terms of collisions (false positives, where two completely different
> songs are given the same puid) - I do not as yet have a concrete example
> where two completely different songs have the same puid, although I suspect
> there are some out there (a necessary theoretical consequence of fuzzy
> matching, if you will).
>
> However, for the context of the specific case here of clean vs. explicit
> recordings, I would expect that some of these would end up with similar
> puids - if you merely drop out a single word that could easily fall below
> the threshold of fuzzy matching.
>
> (Side note on an earlier question - if you aren't getting a puid back
> within 24 hours, something went wrong. The servers have been running
> smoothly, and the average turnaround should be closer to 6 hours.  If you
> have this issue, you can email me directly with info.)
>

It makes sense. So now, what do we do when such collisions occur? Same track
lengths, same PUIDs, different content, do we keep them separate (more
exact, but the owner of the clean version has 50% chances of getting the
"explicit" version of the tags and the owner of the explicit version has of
course 50% chances of getting the "clean" version of the tags) or find some
way of saying that the MB Release corresponds to both releases. Once again,
the track listing and the track times would be the same.

-- 
Frederic Da Vitoria
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.musicbrainz.org/pipermail/musicbrainz-users/attachments/20071028/f6e4b828/attachment.htm


More information about the MusicBrainz-users mailing list