[mb-devel] Query Musicbrainz.org via HTTP: a proposal

deelan deelan at interplanet.it
Tue Dec 30 20:05:29 UTC 2003


hi all,

i wrote some thoughts on my blog about improving the current RDF API of 
mb.org. let me known if you agree/disagre or I get something completely 
  wrong.


Query Musicbrainz.org via HTTP: a proposal

This is a proposal to improve current query model of Musicbrainz.org. 
This is aimed to simplify the process of querying the Musicbrainz.org 
(MB) metabase using standard HTTP requests and thus making MB queries 
somewhat more "software-agent friendly", without the need to 
install/compile/use additional client libraries.

Please note: some pieces of this proposal are already in place and match 
my current wishes, others should be added or refined.

My goal is to use most of the MB query features — hopefully the most 
useful — with simple HTTP GET requests, using URIs to identify resources 
as best explained in "Building Web Services the REST Way".

Querying MB using HTTP POSTs should be limited to the cases where 
relevant query results are not guaranteed to be returned — like a "find 
album" search — or when there are encoding issues and query terms cannot 
be passed as normal querystring parameters.

What can be retrieved

Artists, albums, tracks, CD-index and TRMID metadata, being already 
uniquely identfied, should be all retrieved via an HTTP GET requests. 
All these have URIs and thus are already described in metabase as 
resources. For example:

http://www.musicbrainz.org/artist/4a4ee089-93b1-4470-af9a-6ff575d32704
     Identifies the artist "The Prodigy".

http://www.musicbrainz.org/album/4cfca905-7d38-4bc7-881f-6202a0394786
     Identifies the album "Music for the Jilted Generation".

http://www.musicbrainz.org/track/daa9e1bd-56aa-48bf-9c7b-6cd7cac8c223
     Identifies the track "Break & enter".

http://www.musicbrainz.org/cdindex/XC87Kvf0Onwnu7g_FvE1I_im47I-
     Identifies the album "Music for the Jilted Generation" using its 
CD-index (the same album in different countries may have a different 
CD-index value).

http://www.musicbrainz.org/trmid/dd6a5b51-a08b-409e-89af-b18392a32867
     Identifies a track "Full Throttle" via its TRM Acoustic Fingerprint.

Actually if you try to visit the above links with a web browser the MB 
server will display HTML tables showing metadata information of the 
requested resource.

How RDF/XML info can be retrieved

The idea is to reuse artists, albums, tracks, CD-index and TRMID URIs to 
display a different and more machine-understandable format of the same 
data to allow software agents to browse the RDF/XML data and extract 
useful information for us, the users.

MB server should serve a format instead of another by using HTTP "format 
negotiation", but actually i suspect it just sniffs for user-agent 
strings. Here is a sample Python client implementation asking for album 
metdata in RDF/XML format using format negotiation:

from urllib2 import urlopen, Request

h = {'Accept':'application/rdf+xml'}

# our album URI
uri = "http://www.m.../album/4cfca905-7d38-4bc7-881f-6202a0394786"
r = urlopen(Request(uri, headers=h)) # r holds results

# str does the necessary charset decoding
print str(r.read())

This will perform a GET on the given album URI. The web server will 
look-up for an Accept HTTP header, check if it contains 
application/rdf+xml and then it will send back RDF/XML instead of HTML. 
The Content-type for the response header should be set to 
application/rdf+xml. A policy of what to do if Accept header is missing 
or contains application/rdf+xml among other values needs to be discussed.

Current query model and the related client library includes a "depth" 
value to specify the amount of metadata returned by the MB server.

With my proposed model "depth" would defaulted to value of, say, 1 with 
straight-URI requests and could be explicity specified by appending a 
/depth-value after the URI, for example:

http://www.m.../artist/4a4ee089-93b1-4470-af9a-6ff575d32704/3

This way agents unaware of MB internals would simply request URIs with a 
depth = 1 and then crawl result sets in subsequent steps.

Bootstrapping

All the above discussion implies that an URI for a resource is known 
before the request and this makes sense in the "crawling" scenario; an 
URI can be found anywhere: within an e-mail message, in a playlist or 
inside a web page as an hyperlink.

When an URI for a desired resource is not known in advance current MB 
implementation already provides a complete set of query functions: 
mq:FindArtist, mq:FindAlbum, mq:FindTrack among others use POST requests 
with RDF/XML payloads.

Once again clients making queries should add an Accept and a 
Content-type headers to state they can handle RDF/XML and they are 
sending query terms as RDF/XML payload.

Cool URIs don't change

Finally it would be cool to have a friendly and permanent URI where to 
submit queries, something like "http://www.musicbrainz.org/search". 
Changes in the query engine should be handled by versioning the 
namespace of the MM e MQ vocabularies and not by changing the URI of the 
search script.

Hope this helps.

archived on:
http://www.deelan.com/archives/2003/12/#d30






More information about the MusicBrainz-devel mailing list