[mb-users] Re: Re: Clean vs. Explicit versions

Wendell Hicken whicken at gmail.com
Sun Oct 28 17:40:12 UTC 2007


On 10/28/07, Frederic Da Vitoria <davitofrg at gmail.com> wrote:
>
> It seems all of us agree, except for the possibility of the problem.
> Actually, if Chad is right, we should maybe remove the "U" from PUID, since
> it would mean the IDs are not Unique!
>
> Brian, do you actually KNOW it is impossible, or do you only believe it
> can not happen? AFAIK, the PUIDs could be calculated by taking samples at
> regular intervals (say at 5 seconds intervals), which could miss the bleeps
> sometimes. (I did not say they are calculated this way, just that I never
> saw any indication this was not the case). Here
> http://blog.musicbrainz.org/archives/2006/03/finger_fingerpr.html , it is
> explicitly said: "It should have a lot fewer duplicates and collisions than
> TRM." "less" at least means collisions can happen :-(
>
> Chad (or someone else), do you have or can you put your hands on such an
> occurrence (same track lengths, same PUIDs but different audible content)?
> So that we stop using "if"s! And we can start discussing what we should do
> WHEN it happens.
>

Just some quick notes on the fingerprinting in general - fingerprinting is
necessarily a fuzzy process. Otherwise you don't get the desirable effect of
being insensitive to encoding issues and assorted "minor" differences in the
acoustics.  What two different people consider "minor" is of course open to
considerable interpretation.

In terms of collisions (false positives, where two completely different
songs are given the same puid) - I do not as yet have a concrete example
where two completely different songs have the same puid, although I suspect
there are some out there (a necessary theoretical consequence of fuzzy
matching, if you will).

However, for the context of the specific case here of clean vs. explicit
recordings, I would expect that some of these would end up with similar
puids - if you merely drop out a single word that could easily fall below
the threshold of fuzzy matching.

(Side note on an earlier question - if you aren't getting a puid back within
24 hours, something went wrong. The servers have been running smoothly, and
the average turnaround should be closer to 6 hours.  If you have this issue,
you can email me directly with info.)

Wendell
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.musicbrainz.org/pipermail/musicbrainz-users/attachments/20071028/4063c202/attachment.htm


More information about the MusicBrainz-users mailing list