[mb-devel] Collaborative Filtering: Artist - Artist Relationships (Summer of Code Proposal)

Frederic Da Vitoria davitofrg at gmail.com
Wed Mar 28 10:03:10 UTC 2007


2007/3/28, sharon myrtle <sharon.myrtle at gmail.com>:
> On 3/27/07, Frederic Da Vitoria <davitofrg at gmail.com> wrote:
> >
> > 2007/3/27, sharon myrtle <sharon.myrtle at gmail.com>:
> > > By the way, this is the proposal I've submitted -
> > > http://www.sharonmyrtle.com/Projects/Google Summer of
> > > Code/MusicBrainz proposal.html
> >
> > About the rating system: Should it be nominative or not?
> >
> > I'd like the system to be nominative, for two reasons. First, I fear
> > that an anonymous system would allow zealots to vote dozens of times
> > for their favourite items. Second, only a nominative system allows to
> > change one's mind: if I once gave 8 to a release and after a few
> > months decide that it isn't so good after all and I should only give
> > it a 5, I don't want my old 8 to be used any more in the statistics.
> > Of course, this implies MB must keep track of the votes of each user
> > in the database, and it makes the calculations slightly more complex.
> >
> > --
> > Frederic Da Vitoria
> >
> > _______________________________________________
> > MusicBrainz-devel mailing list
> > MusicBrainz-devel at lists.musicbrainz.org
> >
> http://lists.musicbrainz.org/mailman/listinfo/musicbrainz-devel
> >
>
>
>
> Yes, a nominative rating system would be ideal. But, it's the second point
> about which I have a few reservations.
>
> We could apply Collaborative Filtering algorithm this way - Form an array
> with the rows denoting the artists and columns representing the attribute
> values from data sources. Now, if one of them is rating, we'll have to
> calculate the mean of all user ratings for that particular artist. Now, if a
> user decides to change his rating for an artist, we'll again have to
> calculate the mean of ratings for the artist and also the correlation vector
> of the artist. This is the scenario if one user changes rating for one
> artist. Now if many users decide to change ratings for many artists, the
> computation will become burdensome.
>
> I have an alternative way to get around this problem. We could collect
> ratings and then stop after a required amount has been collected. Now we
> could calculate the correlation vector using these values. After some time
> lapse (say, few months), we can re-calculate the mean of ratings using the
> current rating values of the user. This is because, usually a person won't
> change his/her opinion about an artist very quickly.
>
> This method will help us build a nominative rating system and also reduce
> the amount of rework necessary to form new relationships. --

What about this method:
keep two copies of the user rating count: the one used for the last
vector computation, and the current count. The first would be updated
only when the vector is computed again. The second would be updated
each time a user changes his vote. Then trigger the computation only
when the differences between the 2 counts is large enough to justify a
new computation. This could be done immediately (when the user changes
his vote) or by periodically browsing the counts. Of course this means
that you are able to decide what is the minimum difference which
should trigger a computation.

-- 
Frederic Da Vitoria



More information about the MusicBrainz-devel mailing list