I built a small web app recently, and one particular aspect wasted my time more than I anticipated. It may just be that I'm poorly informed of current trends, but I found all of the documentation and available advice to be poor, and the solution I eventually arrived at is probably close to the best I can get.
The general idea was to have a web front-end that could start and monitor jobs. Jobs would be arbitrary commands run on separate worker machines, which would stream output back to a central location for immediate viewing on the web site.
In several instances in the past where I wanted to have web sites with instant communication, I think I read the wikipedia articles for AJAX and Comet, and then I wrote some python scripts centered around a thing I called dbserver.py. The code is all on this site, but it's in such a poor state that I'd rather not reference it.
The general structure looked like this:
- The web front-end uses jQuery to issue asynchronous requests to the web server, which are handled by a python script, api.py.
- api.py establishes a socket connection to an always-running background daemon, server.py.
- server.py forwards messages between clients. In some iterations, I had it always broadcast every message to all clients. In some iterations, I registered a handful of persistent clients to which all messages went and who could direct messages at transient clients.
- One of the clients was often dbserver.py, which mostly interacted with an sqlite database and notified server.py to notify clients when things in the database changed.
I would say my biggest problems in this setup were in the database-side of things. I generally failed at having a good way of querying the database from the web frontend that was descriptive, safe, and fast. And I had to not just query the database but register for notification of certain types of updates.
I also some problems with duplicate messages (or perhaps the same message going to two server.py clients which turned out to be different requests from the same web client). I have some ideas on how I could have simplified things and been more rigorous.
But really it was just a lot to maintain, and I couldn't believe that there wasn't something already out there for this.
I have difficulty finding things when I don't know what exactly it is that I want.
It seems that the "real-time" web is the New Thing&tm; that matches the original problem I was trying to solve.
Many of the things I found centered around Twisted, Tornado, Orbited, and Node.js. To be honest, I didn't really give any of them a fair chance. I'm not looking for a different web server - just a thing to connect my existing webapp with other things. None of them really jumped out as being capable in the ways I needed.
A common theme seemed to be "message passing." Specifically, I found a number of frameworks built around RabbitMQ. When I arrived at this point, I was feeling good. I routinely undermine libraries I'm given to be able to work at a more reliable, manageable lower level, and RabbitMQ seemed to be a common tool in this business.
RabbitMQ has a number of tutorials using a python library called pika, so I started with that. The first few tutorials were pretty good, but after that I found the documentation to be pretty awful. pika deferred to RabbitMQ. RabbitMQ documentation was absent or tautological ("exchange-name exchange Name of the exchange to bind to."). I muddled around and got a test sort of working. I got fed up when things inexplicably hung or failed to clean up after themselves as I requested. It was never clear when I needed to pump what API for things to work and which things would block.
Next I tried Kombu, which was admittedly more Pythonic but seems to be unmaintained. I ran into the same issues as with pika.
I finally went with py-ampqlib, which was a lot like pika, but somehow easier for me to manage. Or at this point I had learned enough that I was just able to manage it better than my first attempt. Regardless, I got my simple test case basically working.
- I clicked a button on a web site. The web site sent a request to some script.
- That script set up a new "response" exchange for itself, bound a queue to it, and sent a message to a well-known RabbitMQ exchange informing it of a "job" and the exchange to which to respond.
- There was a script listening to a queue bound to that exchange. It pulled the job off the queue and feigned doing some work.
- The script sent a series of messages back to the "response" exchange.
- The web site hit a script which hit RabbitMQ to stream back the messages output from the "response" queue, blocking when appropriate to wait for the next message.
At this point, I appreciated that there was something of a standard for this business. There was no apparent chance of a message going the wrong client, needlessly broadcasting messages to all clients, or bookkeeping on my part to make sure that I only process each message once.
But at the same time, this was super fragile. I had some of the same problems as with pika and some new ones. Ensuring that exchanges were declared at the right time and had queues bound to them so that a worker could start feeding messages in before the web client was ready to start reading them felt as thought it were as complicated as just doing what I did in dbserver. And I hadn't even done the second part of this, which was going to involve creating another process which listened to all exchanges to log into a database, so that I could view logs after the jobs were finished.
My takeaway was that I didn't want queues to be centralized. At least in my use case, I wanted to have one authoritative copy of the streaming output, and I wanted each client to simply keep track of how much of the output it had seen. I never had to worry about cleaning up anything like queues or exchanges, because the only state that was needed was an offset on the client.
The solution was incredibly simple.
- I clicked a button on a web site. It started a "job" process immediately.
- The worker wrote to a file.
- The web site made a request to a script which read from the file, starting from an offset provided by the client.
- If there was more data there, it returned it.
- If there was no data and the worker was still running, it called select(), which blocked until there was more data available.
So after all of that, I had written a web based /usr/bin/tail.
I'm not certain what I learned from this.
And in the end, I think I'm aborting the project that caused me to go down this path.
I expect in the future I will probably find use for all of these approaches.
Summer is quickly coming to an end, and so are my string of Fridays off. I decided Somerset Reservoir was the best lake, despite being out of the way, and set off there this morning.
I didn't see much wildlife this time. Just some geese, ducks, and a beaver.
I had a good paddle but hurried back, because the sky looked like it might rain. It didn't.
|Average Speed (miles per hour)||3.84|
|Average Moving Speed (miles per hour)||3.84|
|Maximum Speed (miles per hour)||5.59|
|Average Pace (minutes per mile)||15.62|
|Average Moving Pace (minutes per mile)||15.62|
|Fastest Pace (minutes per mile)||10.73|
|Total Distance (miles)||15.37|
The weather was great. I was surprised to finish with plenty of energy, even after being a bit out of practice. Time and my inability to sit for so many hours are what really limit how far I can go.
Wildlife was dominated by egrets.
I didn't see many kayaks. Motorboats sped up and down the river constantly. Plenty of people were out fishing.
|Average Speed (miles per hour)||3.90|
|Average Moving Speed (miles per hour)||3.90|
|Maximum Speed (miles per hour)||6.90|
|Average Pace (minutes per mile)||15.39|
|Average Moving Pace (minutes per mile)||15.39|
|Fastest Pace (minutes per mile)||8.69|
|Total Distance (miles)||20.73|
It was another warm sunny day, but I had some errands to run in the morning and wasn't feeling my best, so I drove ten minutes to the Mohawk River and tried to repeat one of my longer paddles there. I headed back a little earlier than I had planned.
|Average Speed (miles per hour)||3.63|
|Average Moving Speed (miles per hour)||3.63|
|Maximum Speed (miles per hour)||5.03|
|Average Pace (minutes per mile)||16.52|
|Average Moving Pace (minutes per mile)||16.52|
|Fastest Pace (minutes per mile)||11.92|
|Total Distance (miles)||16.01|
I have for some reason been acquiring touch screen devices like they were ukuleles.
I have been wanting to do a very brief comparison. They are listed below in the order they're depicted above. I do not actually own the iPad.
|Apple iPad 2||Google Nexus 7||BlackBerry PlayBook||Motorola Droid RAZR MAXX||HTC Droid Incredible|
|Currently installed OS||Apple iOS 5.1.1||Google Android 4.1.1||BlackBerry Tablet OS 22.214.171.1248||Verizon Android 4.0.4||Cyanogenmod Android 2.3.7|
|Acceptable battery life||x||x||x||x||x|
|Home screen rotates||x||x||x|
|Physical buttons||Power, Volume, Home, Rotate/Mute Switch||Power, Volume||Power, Volume, Play/Pause||Power, Volume||Power, Volume, Home|
|Official means of developing for it are free||x||x||x||x|
|Best thing||Lauren likes DragonVale.||Running ScummVM.||Easily closing apps.||4G LTE.||GPS data logging.|
|Worst thing||Having to scroll to the top of my inbox.||Too big to fit in my pocket.||Finicky power button.||An OS release behind.||Home button lights don't turn off, even when watching videos.|
That's about it.