June 2012 - Posts

When system is built, no one wants to baby sit after its up and running. Therefore, there is a strong desire to automate everything, including error handling. But sometimes automation is not suitable for every error, and here is a good example.

An email signup service that I have created is using a 3rd party service to discover city, region, and country from city name only.  Easy and intuitive for customers, head ache free to maintain (no need to keep data source up-to-date). All good and nice till I got an error reported by someone on the team – instead of City the system reported “junk” (see screenshot).

image

After debugging a little, it was simply bad data coming back from the 3rd party service (which I have to admit was extremely reliable and accurate for the most part). So what do you do? Initial response in the team was “lets code it so that when a city has a comma and space, we strip it along with the rest. I.e., when “Calgary, Alberta” is received for a city name, we strip the “, Alberta” portion. Sounds like a great idea, can be automated and be done.

But wait a second, there’s also a different issue as well, sometimes system reports Region (aka state / district / province) incorrectly (“AB Alberta” rather than “Alberta”). It is not affecting production right away. So would it be correct to apply the same “fix logic”? At the same time, it could be “City, Regions, Country” returned in a field for City only. Does it make sense to automate the process of fixing the problem (considering that it happens rarely)? Or, perhaps, it’s worth to automate alerts about malformed data, but leave data clean-up to a person?

We have decided to do the minimum required – automate alerts for data that looks odd, and leave fixing to a person that actually deals with subscriptions.

I’m new to Windows Azure, and learning by making mistakes. There’s a lot to learn about Azure in general, and one of the interesting aspects is deployments and cost associated with it. Taking this moment, I’d like to thank Yves Goeleven, Azure MVP, who has helped me a lot.

The simplest deployment that can be done is either directly from Visual Studio .NET. But it’s not automated, and requires a person to trigger it. Next option is to automate it with PowerShell scripts, leveraging Windows Azure PowerShell Cmdlet. But you have to ask yourself, what am I deploying EVERY SINGLE TIME?

When deployed for the first time, I was horrified – 30MB package. Goodness, no wonder it takes forever. “Azure sucks” was my immediate diagnosis. Wait a second, does it? Hmmm… Something tells me it’s not the Azure that sucks. Let’s analyze it. I have several 3rd party dependencies which contributed over 7MB in assemblies. Wow, that’s a lot. Now for each role (and I have two – web and worker roles) that is 7MB x 2 = 14MB. Heavy, don’t you think?

Solution is simple – Startup Task. Azure supports Startup tasks, which is a very powerful concept. You have an option to operate on the machine instance or a role is deployed to, prior to the role execution. This is great, because I can fetch my 3rd party dependencies just before role instance is started, ensuring all dependencies are in place. Where from though? Azure storage. When you deploy your package, you deploy it to the Azure Storage anyways, so why not to upload a zipped blob with your dependencies once, and fetch it every time? This will save you the cost of uploading for every single deployment you do. Event better – when on the same data centre, you don’t pay for moving data. So not only your packages are smaller, shorter deployment time (upload part), but also you save on storage transactions, translated into money saving.

I have gone through this exercise with the dependency I had – NServiceBus, once just for worker role, and then for web role as well, and results are quite impressive as you can see. From 30MB deployment down to 11MB.

image

Dependencies need to be packaged and uploaded either manually, or scripted as a part of build process upon dependencies version change. Therefore I’d suggest to evaluate which dependencies can follow this path and which cannot. You don’t have to stop on 3rd party only, and can also apply the same to Microsoft Azure assemblies, since those eat up space as well, and are found in every role you deploy.

image

And once you do that, well, you are down to the minimum of your project generated artefacts.

I have an idea of creating a “Dependencies Start-Up Task” NuGet package that would take away boiler platting away and allow you to achieve this task with less effort. Would you consider it to be useful? Let me know your opinion, and, perhaps, a few bits will be spared and NuGet be less spammed.

More Posts