Passing on Message Passing

I built a small web app recently, and one particular aspect wasted my time more than I anticipated. It may just be that I'm poorly informed of current trends, but I found all of the documentation and available advice to be poor, and the solution I eventually arrived at is probably close to the best I can get.

The general idea was to have a web front-end that could start and monitor jobs. Jobs would be arbitrary commands run on separate worker machines, which would stream output back to a central location for immediate viewing on the web site.

dbserver

In several instances in the past where I wanted to have web sites with instant communication, I think I read the wikipedia articles for AJAX and Comet, and then I wrote some python scripts centered around a thing I called dbserver.py. The code is all on this site, but it's in such a poor state that I'd rather not reference it.

The general structure looked like this:

  1. The web front-end uses jQuery to issue asynchronous requests to the web server, which are handled by a python script, api.py.
  2. api.py establishes a socket connection to an always-running background daemon, server.py.
  3. server.py forwards messages between clients. In some iterations, I had it always broadcast every message to all clients. In some iterations, I registered a handful of persistent clients to which all messages went and who could direct messages at transient clients.
  4. One of the clients was often dbserver.py, which mostly interacted with an sqlite database and notified server.py to notify clients when things in the database changed.

I would say my biggest problems in this setup were in the database-side of things. I generally failed at having a good way of querying the database from the web frontend that was descriptive, safe, and fast. And I had to not just query the database but register for notification of certain types of updates.

I also some problems with duplicate messages (or perhaps the same message going to two server.py clients which turned out to be different requests from the same web client). I have some ideas on how I could have simplified things and been more rigorous.

But really it was just a lot to maintain, and I couldn't believe that there wasn't something already out there for this.

Research

I have difficulty finding things when I don't know what exactly it is that I want.

It seems that the "real-time" web is the New Thing&tm; that matches the original problem I was trying to solve.

Many of the things I found centered around Twisted, Tornado, Orbited, and Node.js. To be honest, I didn't really give any of them a fair chance. I'm not looking for a different web server - just a thing to connect my existing webapp with other things. None of them really jumped out as being capable in the ways I needed.

A common theme seemed to be "message passing." Specifically, I found a number of frameworks built around RabbitMQ. When I arrived at this point, I was feeling good. I routinely undermine libraries I'm given to be able to work at a more reliable, manageable lower level, and RabbitMQ seemed to be a common tool in this business.

RabbitMQ has a number of tutorials using a python library called pika, so I started with that. The first few tutorials were pretty good, but after that I found the documentation to be pretty awful. pika deferred to RabbitMQ. RabbitMQ documentation was absent or tautological ("exchange-name exchange Name of the exchange to bind to."). I muddled around and got a test sort of working. I got fed up when things inexplicably hung or failed to clean up after themselves as I requested. It was never clear when I needed to pump what API for things to work and which things would block.

Next I tried Kombu, which was admittedly more Pythonic but seems to be unmaintained. I ran into the same issues as with pika.

I finally went with py-ampqlib, which was a lot like pika, but somehow easier for me to manage. Or at this point I had learned enough that I was just able to manage it better than my first attempt. Regardless, I got my simple test case basically working.

  1. I clicked a button on a web site. The web site sent a request to some script.
  2. That script set up a new "response" exchange for itself, bound a queue to it, and sent a message to a well-known RabbitMQ exchange informing it of a "job" and the exchange to which to respond.
  3. There was a script listening to a queue bound to that exchange. It pulled the job off the queue and feigned doing some work.
  4. The script sent a series of messages back to the "response" exchange.
  5. The web site hit a script which hit RabbitMQ to stream back the messages output from the "response" queue, blocking when appropriate to wait for the next message.

At this point, I appreciated that there was something of a standard for this business. There was no apparent chance of a message going the wrong client, needlessly broadcasting messages to all clients, or bookkeeping on my part to make sure that I only process each message once.

But at the same time, this was super fragile. I had some of the same problems as with pika and some new ones. Ensuring that exchanges were declared at the right time and had queues bound to them so that a worker could start feeding messages in before the web client was ready to start reading them felt as thought it were as complicated as just doing what I did in dbserver. And I hadn't even done the second part of this, which was going to involve creating another process which listened to all exchanges to log into a database, so that I could view logs after the jobs were finished.

Operating Systems

My takeaway was that I didn't want queues to be centralized. At least in my use case, I wanted to have one authoritative copy of the streaming output, and I wanted each client to simply keep track of how much of the output it had seen. I never had to worry about cleaning up anything like queues or exchanges, because the only state that was needed was an offset on the client.

The solution was incredibly simple.

  1. I clicked a button on a web site. It started a "job" process immediately.
  2. The worker wrote to a file.
  3. The web site made a request to a script which read from the file, starting from an offset provided by the client.
    • If there was more data there, it returned it.
    • If there was no data and the worker was still running, it called select(), which blocked until there was more data available.

So after all of that, I had written a web based /usr/bin/tail.

Conclusion

I'm not certain what I learned from this.

And in the end, I think I'm aborting the project that caused me to go down this path.

I expect in the future I will probably find use for all of these approaches.

Comments

1. dave.foster@… -- 9 years ago

You may have gone too low in the stack - RabbitMQ/AMQP is fantastic but it seems like what you're really looking for is a task queue, like Resque or Celery, which build on top of technologies like AMQP.

We use Pika for our work project but it is heavily wrapped/patched in places where it is deficient. We've essentially built a kombu-like layer on top for better or worse. You're correct in saying most of the libraries for Python are crap - pika is in front but still lacking quite a bit.