Reading from a queue in an Azure WebJob
A few months ago, Microsoft introduced something called a WebJob in Azure. It's essentially a "thing" that can run as a background task to do "stuff." The reason this is cool has a lot to do with the way you would do this sort of thing in the pre-cloud days.
Handling some background task in the old days usually meant writing a Windows Service. It was this thing that you had to install, and it was kind of a pain. The scope of background tasks is pretty broad, ranging from image or queue processing to regularly doing something on a schedule to whatever. For those of us who have focused on the Web and services, they're definitely a weird thing to think about.
Azure made this more interesting with worker roles (or cloud services, which also include web roles), which are essentially virtual machines that do just one thing. Those are pretty cool, but of course the cost involves spinning up an entire VM. They start at $14 a month, per instance right now, but still, it's not like your Azure Websites are running at full utilization, so it makes sense to use that resource since you’re already paying for it.
That's where WebJobs are awesome, because they run on the VM that's already running your sites. If you have something to do that isn't going to overwork that VM, a WebJob is perfect. They run pretty much any flavor of code you can think of, but for the purpose of this post, I'm thinking C#. For added flavor, you can bind these jobs to the various forms of Azure storage, and do it without having to wire stuff up. See Scott Hanselman's intro for more info.
I just happen to have a use case where this totally makes sense. I have a project where I'm using Lucene.net, a port of the Java text search engine, to search tags and titles for various pieces of content. I'm also using the AzureDirectory Library with it, which allows me to use blob storage for the index. Updating the index happens when a user creates or edits content. Infrequent as that might be, it is time consuming, and it's a crappy user experience to make them wait. The solution then is to queue a message that says, "Hey, this content is updated, so update the index, please." Firing off a message to the queue is super fast, and the user is happy.
This is a pretty common pattern when you have to break stuff up into components, and a little latency is OK. In this case, it's not a big deal if the search index isn't updated instantly. If it doesn't happen even for a few minutes, that's probably good enough (even though it likely happens within a second or two).
As with the other examples out there, the code to set up the WebJob as a C# console app is really straight forward. In my case, I have some extra stuff in there to take care of the StructureMap plumbing, resolving dependencies between different assemblies and such.
internal class Program
{
	private static void Main(string[] args)
	{
		ObjectFactory.Initialize(x =>
			{
				x.Scan(scan =>
					{
						scan.TheCallingAssembly();
						scan.WithDefaultConventions();
						scan.AssembliesFromApplicationBaseDirectory();
					});
				x.For<IProjectSearchIndexer>().Use<ProjectSearchIndexer>();
				x.For<IProjectSearchRepository>().Use<Search.Repositories.ProjectSearchRepository>();
				x.For<IProjectSearchIndexQueueRepository>().Use<Search.Repositories.ProjectSearchIndexQueueRepository>();
			});
		var host = new JobHost();
		host.RunAndBlock();
	}
	public static void ProcessProjectSearchQueue([QueueInput("searchindexqueue")] ProjectSearchQueueMessage message)
	{
		var indexer = ObjectFactory.GetInstance<IProjectSearchIndexer>();
		indexer.Processor(message.ProjectID, message.ProjectSearchFunction);
	}
}
The ObjectFactory stuff is the StructureMap container setup, and right after that is the WebJob magic from the SDK. I’m pretty sure what those two lines are doing is saying, “Hey Azure, you’ve gotta run this stuff, so just hang out and don’t let the app close.”
The ProcessProjectSearchQueue is where the magic wireup to Azure storage takes place. The QueueInput attribute is looking for a queue to monitor, in this case “searchindexqueue.” The connection string, as mentioned in the other articles you can Google on Bing, show you how to put the storage account string in the Azure administration portal. In the case of this code, when a message hits that queue, this function reads it from the queue and acts on it. It’s like magic.
As of the time of this writing, WebJobs are in preview, so the documentation is a little thin. On the other hand, the product itself is really robust at this point. The monitoring stuff and ability to get a stack trace when something is broken is really awesome.
Here are the bumps I hit in implementing this:
- My calling code has to talk to SQL via Entity Framework. The app.config for my WebJob did not have the EF configuration section that specifies to use System.Data.SqlClient, so it choked until I had that in place.
- At first I had my StructureMap initialization after the RunAndBlock call, which was pretty silly because that method is pretty descriptive about what it’s doing.
- I went down an ugly dependency hole of despair at first, where the WebJob required a ton of assemblies from a core library. In this case, I just needed to pull out the SQL data access to its own project in my solution. DI containers like StructureMap help with this (duh).
- The deployment is a little ugly because there’s no tooling for it, but it’s still just a matter of zipping up the build and uploading via the Azure portal.
- You can’t run it locally. I hope they’re going to figure out a way to simulate this, because having to test with real Azure can be a little awkward when you need to share your code (and connection strings) with other developers. To compensate, I took the two lines in the above method and put them in an MVC action to call at will by viewing the action in a browser.
- If your code fails, the queue message is gone forever. I haven’t used Azure queues in awhile, but I do recall the mechanism that restores a message in the event you can’t process it. Normally you would have some retry logic, so I’m not sure what to do here.
This is a really exciting piece of technology, and I’m planning to use it next to pull out the background stuff in POP Forums, which currently runs on Timers out of an HttpModule. Ditching that ugly hack after more than a decade means finally getting the app to a multi-instance place. That makes me very happy.