[mb-devel] Collaborative Filtering: Artist - Artist Relationships (Summer of Code Proposal)

david scotson david.scotson at gmail.com
Mon Mar 26 10:17:11 UTC 2007


On 3/24/07, sharon myrtle <sharon.myrtle at gmail.com> wrote:

> I've been reading the discussion on the custom tagging system. I think that
> Collaborative Filtering algorithm would benefit from it as this would help
> in analysis. However, for the proposal, I agree that implementing a rating
> system will be conducive to accumulate valuable data to feed into the CF
> algorithm (apart from the Artist Albums, Search Logs and Artist
> Subscriptions).

This sounds like a great project.

Can I suggest artist - recording label connections (as present on the
test server) as a potential similarity vector? This won't always be
the case but for some (e.g Motown, Stax, Blue Note, SubPop, Philles) I
think it's a very important factor. And as well as intra-label
connections you could then make a 2nd order connection via labels e.g.
people signed to Motown are 'similar' to people signed to Stax or
Philles.

Also, the Wikipedia has a bunch of relation data in the page link
(e.g. the Lennon article will link to Beatles and the other members)
and category link data (e.g. the Beatles are categorised as an
English, Liverpudlan, 1960s, Parlophone signed, Beat group) which can
be downloaded seperately and analysed. Not only would this be useful
to examine in it's own right, but by comparing Wikipedia clusters with
Musicbrainz clusters you can be more sure that MB links to the correct
page in Wikipedia (e.g. Pavement (band) vs. Pavement).

On the tags and ratings issue: I have some artist/album/song tags in
last.fm, and some track ratings both in iTunes/iPod (at home) and the
Linux player of the week (at the office) but I can't really be
bothered investing time in building up these data stores until I can
share them between apps, back them up properly and be sure to be able
to take them with me as I change operating system and music player
software. Having them hosted permanently and remotely by Musicbrainz
would be very nice for me, especially if done in an open format so I
could optionally also host them on my own server if I really wanted.

Also, though slightly off-topic, PicardQt is rocking my world right now.

regards,

dave



More information about the MusicBrainz-devel mailing list