Wednesday, January 05, 2005

Wow. Two articles published on xml.com today about Amazon's Simple Queue Service. I'm a developer on this project so it's really exciting to see people writing about it.

In short, SQS is a message queue, designed for scalability and reliability, hosted in Amazon's datacenters, and accessed using web services.

Fun With Amazon's Simple Queue Service is a friendly introduction to the API and includes a sample chat application written in javascript.

How RESTful is Amazon's Simple Queue Service? is an article critical of our REST API. REST stands for REpresentational State Transfer, and is "an architectural style for large-scale software design". The author's answer to his rhetorical title is of course "not very." Point well taken. I (personally) agree that we shouldn't refer to our "HTTP GET with parameters in URL" version of the API as "REST". However, I also don't think we should change it to be more RESTful according to this author's suggestions. Keeping the service simple and accessible to new developers is one big reason. Everyone knows how to use a browser and even the most novice developer can construct a URL to make an ad-hoc call to try out any AWS API. His or her browser will then nicely render the response XML, or he or she could apply a server-side stylesheet and transform it into anything at all. However, far fewer people understand the various HTTP methods or could issue an ad-hoc query as quickly and easily against a "true" REST API. (A previous xml.com article suggests using telnet ...)

Levitt, in the first article linked to above, illustrates my point perfectly:

Using Amazon's REST interface, I could demonstrate the entire API without using a programming language at all. That's because Amazon chose to implement SQS using only HTTP GET requests (no PUTs, DELETEs, or POSTs). I could simply construct the appropriate URLs and put them into my favorite web browser to execute them. While some might disagree with this design about Amazon's non-RESTy choice to use only HTTP GETs for SQS, the fact is that it makes life very simple for developers.

Despite all this, I found Joe's proposed 'more RESTful reformulation' of SQS very interesting. I feel more enlightened about the REST philosophy after reading his article, but I'd like to point out a problem with his proposal. One of the article's main critcisms of SQS's REST API is the use of HTTP GET for all operations:

The Amazon Queue Service does everything through GET, and no, that's not a good thing ... There are some operations that should use GET. ListMyQueues and Read are both perfectly good GET candidates as they only return information. The problem with the other five operations is that GET is supposed to be both “safe” and “idempotent.” The terms “safe” and “idempotent” have particular meanings for HTTP. “Safe” means that GET does not have the significance of taking an action other than retrieval. “Idempotence” means that the side-effects of N identical requests are the same as for a single request.

Actually, every operation in every AWS service including SQS can be accessed by both GET and POST. We accept them both so go ahead and use whichever one makes you happiest! But more importantly, this author is mistaken about the semantics of Read.

SQS is designed to support many processes concurrently consuming messages from a single logical queue. If Read always returned the message at the top of the queue, as the article assumes, concurrent readers would duplicate the processing of most messages. Instead, what concurrent readers usually want is to divide up the queue traffic among them. This way you can scale up your message processing capacity simply by adding more readers. To permit this, a Read from the queue actually locks the message read for a limited time while the consumer processes it. By 'lock' I mean that a subsequent Read against the same queue within this limited time period will retrieve (and similarly lock) the next entry in the queue, even though the first entry appears first and hasn't yet been dequeued.

So while SQS's Read is exactly what concurrent queue consumers want, it is neither "safe" nor "idempotent" and therefore according to Gregorio should not be a GET. Since none of the other HTTP methods (HEAD, POST, PUT, DELETE) capture the sematics of SQS's locking Read, no API that Amazon could design would satisfy the REST zealots. If we had to choose one method for READ, GET would probably be the most appropriate, but my point is that not every useful operation can be expressed in the REST/HTTP world. Rather than being limited by the methods that HTTP offers, we simply use HTTP as the transport and build whatever operations we care to dream up on top of it.

P.S. Gregorio is not the first to propose a more RESTful queue API. A design and implementation in python was published here just a few days after our initial launch.

No comments: