[mb-devel] Query Musicbrainz.org via HTTP: a proposal
deelan
deelan at interplanet.it
Tue Dec 30 20:05:29 UTC 2003
hi all,
i wrote some thoughts on my blog about improving the current RDF API of
mb.org. let me known if you agree/disagre or I get something completely
wrong.
Query Musicbrainz.org via HTTP: a proposal
This is a proposal to improve current query model of Musicbrainz.org.
This is aimed to simplify the process of querying the Musicbrainz.org
(MB) metabase using standard HTTP requests and thus making MB queries
somewhat more "software-agent friendly", without the need to
install/compile/use additional client libraries.
Please note: some pieces of this proposal are already in place and match
my current wishes, others should be added or refined.
My goal is to use most of the MB query features — hopefully the most
useful — with simple HTTP GET requests, using URIs to identify resources
as best explained in "Building Web Services the REST Way".
Querying MB using HTTP POSTs should be limited to the cases where
relevant query results are not guaranteed to be returned — like a "find
album" search — or when there are encoding issues and query terms cannot
be passed as normal querystring parameters.
What can be retrieved
Artists, albums, tracks, CD-index and TRMID metadata, being already
uniquely identfied, should be all retrieved via an HTTP GET requests.
All these have URIs and thus are already described in metabase as
resources. For example:
http://www.musicbrainz.org/artist/4a4ee089-93b1-4470-af9a-6ff575d32704
Identifies the artist "The Prodigy".
http://www.musicbrainz.org/album/4cfca905-7d38-4bc7-881f-6202a0394786
Identifies the album "Music for the Jilted Generation".
http://www.musicbrainz.org/track/daa9e1bd-56aa-48bf-9c7b-6cd7cac8c223
Identifies the track "Break & enter".
http://www.musicbrainz.org/cdindex/XC87Kvf0Onwnu7g_FvE1I_im47I-
Identifies the album "Music for the Jilted Generation" using its
CD-index (the same album in different countries may have a different
CD-index value).
http://www.musicbrainz.org/trmid/dd6a5b51-a08b-409e-89af-b18392a32867
Identifies a track "Full Throttle" via its TRM Acoustic Fingerprint.
Actually if you try to visit the above links with a web browser the MB
server will display HTML tables showing metadata information of the
requested resource.
How RDF/XML info can be retrieved
The idea is to reuse artists, albums, tracks, CD-index and TRMID URIs to
display a different and more machine-understandable format of the same
data to allow software agents to browse the RDF/XML data and extract
useful information for us, the users.
MB server should serve a format instead of another by using HTTP "format
negotiation", but actually i suspect it just sniffs for user-agent
strings. Here is a sample Python client implementation asking for album
metdata in RDF/XML format using format negotiation:
from urllib2 import urlopen, Request
h = {'Accept':'application/rdf+xml'}
# our album URI
uri = "http://www.m.../album/4cfca905-7d38-4bc7-881f-6202a0394786"
r = urlopen(Request(uri, headers=h)) # r holds results
# str does the necessary charset decoding
print str(r.read())
This will perform a GET on the given album URI. The web server will
look-up for an Accept HTTP header, check if it contains
application/rdf+xml and then it will send back RDF/XML instead of HTML.
The Content-type for the response header should be set to
application/rdf+xml. A policy of what to do if Accept header is missing
or contains application/rdf+xml among other values needs to be discussed.
Current query model and the related client library includes a "depth"
value to specify the amount of metadata returned by the MB server.
With my proposed model "depth" would defaulted to value of, say, 1 with
straight-URI requests and could be explicity specified by appending a
/depth-value after the URI, for example:
http://www.m.../artist/4a4ee089-93b1-4470-af9a-6ff575d32704/3
This way agents unaware of MB internals would simply request URIs with a
depth = 1 and then crawl result sets in subsequent steps.
Bootstrapping
All the above discussion implies that an URI for a resource is known
before the request and this makes sense in the "crawling" scenario; an
URI can be found anywhere: within an e-mail message, in a playlist or
inside a web page as an hyperlink.
When an URI for a desired resource is not known in advance current MB
implementation already provides a complete set of query functions:
mq:FindArtist, mq:FindAlbum, mq:FindTrack among others use POST requests
with RDF/XML payloads.
Once again clients making queries should add an Accept and a
Content-type headers to state they can handle RDF/XML and they are
sending query terms as RDF/XML payload.
Cool URIs don't change
Finally it would be cool to have a friendly and permanent URI where to
submit queries, something like "http://www.musicbrainz.org/search".
Changes in the query engine should be handled by versioning the
namespace of the MM e MQ vocabularies and not by changing the URI of the
search script.
Hope this helps.
archived on:
http://www.deelan.com/archives/2003/12/#d30
More information about the MusicBrainz-devel
mailing list