January 2004 - Posts

1. If Windows Installer is Microsoft's “official” installation technology, then why the heck don't they use it and support it better?  It would make our lives a lot easier if Microsoft released a merge module for the .NET Framework, for example.  It's the perfect example of where a merge module is needed.  Instead, we get silent install kludges.

2. If Windows Installer is Microsoft's “official” installation technology, why don't they package all of their own software with it?  Whatever happened to eating their own dogfood?  I see as many projects using Classic Installshield as I do windows installer.  And, as long as we are on the subject, why aren't Microsoft products required to pass their own Windows Logo Compliance requirements?  This would head off many of these issues, as MSI is required for logo compliance on the client.

As my friend Paul Wilson points out, sometimes technical answers are really hard to find.  As I browse the ASP.NET and C# newsgroups, weblogs, and Forums, answers abound on how to bind a dataset to a control.  I can find numerous explanations of how .NET garbage collection works, best practices utilizing the dispose pattern, and many other helpful hints.

However, what happens when you are working at the bleeding edge of technology, in an environment you can't replicate on your own PC?  Things get a little more complicated here.  Mainframe programmers are used to these limitations, but this really frustrates windows programmers.  As I type this on my Windows 2003 Server (a laptop), which also easily replicates a clustered Windows 2000 environment using VMWare, I remember that one of the things that made windows a powerhouse in the server room is ease of programming.  I've also got more power at my fingertips than the entire world had in processing power a few decades ago.

Still,, what do you do when you are banging against a problem when you are probably the only person who has ever seen it?  This is often the case when you are working in a Citrix environment, or in my case, Windows Clustering Services, or any other highly expensive environment.

In these cases, internet resources aren't really much help, and a developer doesn't often have direct access to the vendor.  As an open question, what is one to do?

So, I've been 'under the Radar' recently, as I've just started a new job with Eclipsys Corporation.  I work on our interface engine, eLink.  You could think of it as a very specialized BizTalk server.

 

Well, eLink is high-throughput and high-availability, and so we leverage Microsoft Windows 2000 Advanced Server with Cluster Services.  This is my first foray into a clustered environment in the Windows world, though I’ve set up a Linux cluster before.

 

Microsoft Clustering for High Availability is not terribly impressive (This is one area where Microsoft is still playing catch-up to the UNIX and Mainframe world.)  The basic concept is that two machines share a disk array, and usually act in an active / passive mode.  If the active server fails, the passive server takes control of the shared drive array, and steps up to the plate, starting the services that failed on the previous node.

 

Well, one of the problems is that this works somewhat well for cluster-aware applications, and poorly for non-cluster aware applications.  No problem - we'll just write our applications to be cluster aware.

 

Well, if it was only that simple - the problem isn't with our software, which works fine in a clustered environment.  It's Windows 2000 Advanced server itself!  There are entire portions of the operating system and several tools are not cluster aware.

 

Perfmon seems to be especially difficult to work with in a cluster, and our event viewers quickly fill up with spurious Perfmon related messages.

 

IIS isn't really cluster aware - although the process itself will fail over, you lose any config information in the metabase, unless you manually replicated beforehand.  You also lose ASP.NET session state, unless you use a session state server on a third machine or you store session state in SQL Server.  Not a big deal, but still something to pay attention to.

 

The biggest problems we had were actually MSMQ related.  First, we had a non-cluster related MSMQ issue where MSMQ would consume 80% of kernel memory and then stop, refusing to allocate any more memory.  Windows is actually designed to garbage collect kernel memory at 90%, but we never get there because MSMQ hangs before reaching that point.  MSKB 811308 describes this problem, and the solution.

 

A bigger problem was that MSMQ would not always successfully failover to the backup node.  This was actually reproducible in our lab when we would get the MSMQ storage up around 800 or 900 MB.

 

After banging my head against a brick wall for several days, I put in a support call to Microsoft, and ended up talking to Muhammed Ismail from the MSMQ team.  Let me tell you, he knew his MSMQ backwards and forwards.  Still, it wasn’t anything that we were doing wrong, so he sent us the latest version of MSMQ 2.0, we installed it in our test labs, and noticed an immediate resolution to all of our MSMQ problems.

 

Advice – if you are going to be architecting solutions that rely on Microsoft’s High Availability using Microsoft Cluster Services, be certain to research (and prototype) solutions before implementing them.  We noticed several things that simply aren’t supported in a clustered environment, and several things that were supposed to be supported that just plain don't work well. (For example, MSMQ Triggers are a cluster no-no on Windows 2000, though they are available on Windows 2003 clusters.)

 --

So, I've come out of all of this wishing that if only I could cluster myself and make me a High-Availability Developer, then I could get some sleep while I worked!
More Posts