close Warning: Error with navigation contributor "BittenChrome"
Posts for the month of August 2012

Passing on Message Passing

I built a small web app recently, and one particular aspect wasted my time more than I anticipated. It may just be that I'm poorly informed of current trends, but I found all of the documentation and available advice to be poor, and the solution I eventually arrived at is probably close to the best I can get.

The general idea was to have a web front-end that could start and monitor jobs. Jobs would be arbitrary commands run on separate worker machines, which would stream output back to a central location for immediate viewing on the web site.

dbserver

In several instances in the past where I wanted to have web sites with instant communication, I think I read the wikipedia articles for AJAX and Comet, and then I wrote some python scripts centered around a thing I called dbserver.py. The code is all on this site, but it's in such a poor state that I'd rather not reference it.

The general structure looked like this:

  1. The web front-end uses jQuery to issue asynchronous requests to the web server, which are handled by a python script, api.py.
  2. api.py establishes a socket connection to an always-running background daemon, server.py.
  3. server.py forwards messages between clients. In some iterations, I had it always broadcast every message to all clients. In some iterations, I registered a handful of persistent clients to which all messages went and who could direct messages at transient clients.
  4. One of the clients was often dbserver.py, which mostly interacted with an sqlite database and notified server.py to notify clients when things in the database changed.

I would say my biggest problems in this setup were in the database-side of things. I generally failed at having a good way of querying the database from the web frontend that was descriptive, safe, and fast. And I had to not just query the database but register for notification of certain types of updates.

I also some problems with duplicate messages (or perhaps the same message going to two server.py clients which turned out to be different requests from the same web client). I have some ideas on how I could have simplified things and been more rigorous.

But really it was just a lot to maintain, and I couldn't believe that there wasn't something already out there for this.

Research

I have difficulty finding things when I don't know what exactly it is that I want.

It seems that the "real-time" web is the New Thing&tm; that matches the original problem I was trying to solve.

Many of the things I found centered around Twisted, Tornado, Orbited, and Node.js. To be honest, I didn't really give any of them a fair chance. I'm not looking for a different web server - just a thing to connect my existing webapp with other things. None of them really jumped out as being capable in the ways I needed.

A common theme seemed to be "message passing." Specifically, I found a number of frameworks built around RabbitMQ. When I arrived at this point, I was feeling good. I routinely undermine libraries I'm given to be able to work at a more reliable, manageable lower level, and RabbitMQ seemed to be a common tool in this business.

RabbitMQ has a number of tutorials using a python library called pika, so I started with that. The first few tutorials were pretty good, but after that I found the documentation to be pretty awful. pika deferred to RabbitMQ. RabbitMQ documentation was absent or tautological ("exchange-name exchange Name of the exchange to bind to."). I muddled around and got a test sort of working. I got fed up when things inexplicably hung or failed to clean up after themselves as I requested. It was never clear when I needed to pump what API for things to work and which things would block.

Next I tried Kombu, which was admittedly more Pythonic but seems to be unmaintained. I ran into the same issues as with pika.

I finally went with py-ampqlib, which was a lot like pika, but somehow easier for me to manage. Or at this point I had learned enough that I was just able to manage it better than my first attempt. Regardless, I got my simple test case basically working.

  1. I clicked a button on a web site. The web site sent a request to some script.
  2. That script set up a new "response" exchange for itself, bound a queue to it, and sent a message to a well-known RabbitMQ exchange informing it of a "job" and the exchange to which to respond.
  3. There was a script listening to a queue bound to that exchange. It pulled the job off the queue and feigned doing some work.
  4. The script sent a series of messages back to the "response" exchange.
  5. The web site hit a script which hit RabbitMQ to stream back the messages output from the "response" queue, blocking when appropriate to wait for the next message.

At this point, I appreciated that there was something of a standard for this business. There was no apparent chance of a message going the wrong client, needlessly broadcasting messages to all clients, or bookkeeping on my part to make sure that I only process each message once.

But at the same time, this was super fragile. I had some of the same problems as with pika and some new ones. Ensuring that exchanges were declared at the right time and had queues bound to them so that a worker could start feeding messages in before the web client was ready to start reading them felt as thought it were as complicated as just doing what I did in dbserver. And I hadn't even done the second part of this, which was going to involve creating another process which listened to all exchanges to log into a database, so that I could view logs after the jobs were finished.

Operating Systems

My takeaway was that I didn't want queues to be centralized. At least in my use case, I wanted to have one authoritative copy of the streaming output, and I wanted each client to simply keep track of how much of the output it had seen. I never had to worry about cleaning up anything like queues or exchanges, because the only state that was needed was an offset on the client.

The solution was incredibly simple.

  1. I clicked a button on a web site. It started a "job" process immediately.
  2. The worker wrote to a file.
  3. The web site made a request to a script which read from the file, starting from an offset provided by the client.
    • If there was more data there, it returned it.
    • If there was no data and the worker was still running, it called select(), which blocked until there was more data available.

So after all of that, I had written a web based /usr/bin/tail.

Conclusion

I'm not certain what I learned from this.

And in the end, I think I'm aborting the project that caused me to go down this path.

I expect in the future I will probably find use for all of these approaches.

Kayaking: Somerset Reservoir #3

Summer is quickly coming to an end, and so are my string of Fridays off. I decided Somerset Reservoir was the best lake, despite being out of the way, and set off there this morning.

I didn't see much wildlife this time. Just some geese, ducks, and a beaver.

I had a good paddle but hurried back, because the sky looked like it might rain. It didn't.

Map

Speed Graph

Time from Start (minutes)

  • Posted: 2012-08-24 19:46 (Updated: 2012-08-24 19:51)
  • Categories: kayak
  • Comments (0)

Kayaking: Mohawk River #6

The weather was great. I was surprised to finish with plenty of energy, even after being a bit out of practice. Time and my inability to sit for so many hours are what really limit how far I can go.

Wildlife was dominated by egrets.

I didn't see many kayaks. Motorboats sped up and down the river constantly. Plenty of people were out fishing.

Map

Speed Graph

Time from Start (minutes)

  • Posted: 2012-08-19 20:01 (Updated: 2012-08-19 20:11)
  • Categories: kayak
  • Comments (0)

Kayaking: Mohawk River #5

It was another warm sunny day, but I had some errands to run in the morning and wasn't feeling my best, so I drove ten minutes to the Mohawk River and tried to repeat one of my longer paddles there. I headed back a little earlier than I had planned.

Map

Speed Graph

Time from Start (minutes)

  • Posted: 2012-08-04 17:54 (Updated: 2012-08-04 19:12)
  • Categories: kayak
  • Comments (0)

Tablets

I have for some reason been acquiring touch screen devices like they were ukuleles.

I have been wanting to do a very brief comparison. They are listed below in the order they're depicted above. I do not actually own the iPad.

Apple iPad 2Google Nexus 7BlackBerry PlayBookMotorola Droid RAZR MAXXHTC Droid Incredible
Currently installed OSApple iOS 5.1.1Google Android 4.1.1BlackBerry Tablet OS 2.0.1.668Verizon Android 4.0.4Cyanogenmod Android 2.3.7
Screen Resolution1024x7681280x8001024x600960x540800x480
Acceptable battery lifexxxxx
Home screen rotatesx x x
Physical buttonsPower, Volume, Home, Rotate/Mute SwitchPower, VolumePower, Volume, Play/PausePower, VolumePower, Volume, Home
Official means of developing for it are free xxxx
Best thingLauren likes DragonVale.Running ScummVM.Easily closing apps.4G LTE.GPS data logging.
Worst thingHaving to scroll to the top of my inbox.Too big to fit in my pocket.Finicky power button.An OS release behind.Home button lights don't turn off, even when watching videos.

That's about it.