[Playlist] why we shouldn't include derived data

Lucas Gonze lgonze at panix.com
Mon Mar 22 00:29:35 UTC 2004

As you can tell from my blizzard of mail today, I have been working
through the metadata spec.  I think that we are mixing two kinds of
data, and would be better off if we separated them.  We should move
all derived data into a new document that I'm going to call a catalog.

Our first reason for being here is to define a portable format.  We
want all software to be able to use it, and we want it to move easily
from machine to machine.  Some metadata travels well and some doesn't.
Derived metadata doesn't travel well, so it belongs in a document that  
intended to be portable, the catalog.  Metadata that does travel well is
useless as part of a cache; the URL of a playlist author, for example.

A catalog is a cache of data about songs you have locally.  It holds
paths to files, derived info like bit rate, general cruft like a play
count.  All this data is personal.  You have high trust in this
metadata because you calculated or recorded it yourself, but other
people don't.

A playlist is a document which will float around the internet in about
the same way as an HTML file or a tarball.  It tells you enough about
songs that you can fetch a copy, the order of the songs, and other
info relative to publishing and consuming lists of songs.

A catalog is private, a playlist is public.  A catalog is for derived
info, a playlist is for canonical info.  A playlist is a write-once
object.  A catalog is constantly being updated.  A playlist is
portable but not editable; a catalog is editable but not portable.

So I propose that we declare catalogs and playlists different projects
and stick to finishing the playlist project.  If you look at how
applications use playlists already, this is pretty much how it's done.
There's a cache of metadata which is used to avoid re-reading tags,
and playlists are just pointers into the cache.  That's what iTunes
XML does.  I'm sure Winamp does the same.

Practical implications:

* The catalog belongs to the content resolver.  It has one big
   catalog, which it keeps private, and it knows how to merge a bunch
   of different small playlists with the catalog.  Let's say the CR has
   a playlist pointing to http://foo.com/bar.mp3.  It looks in the
   catalog for a file by that name.  If it finds the file, it returns
   that file reference.  If it doesn't find the file, it downloads it
   to cache, saves the URL/pathname pair in the catalog, and returns
   the pathname.

* We should take everything out of the SupportedMetadata page that
   belongs to the catalog.  That means everything derived, not well
   defined, or private.

* For trivial playlists of songs on your hard drive, it's extra hassle
   to have to calculate a SHA1 or whatnot just to be able to define a
   playlist, so I think it's reasonable to allow file URIs in
   playlists even though they're private.


A simplified version of the playlist XML with some syntax errors fixed  
and derived data removed is at

- Lucas

More information about the Playlist mailing list