This is a story about leaky abstractions.  It’s not a happy story, and I don’t talk about cool things like WPF, AJAX, JSON, POX, or how much I detest the RIAA.
Instead, this is the story of a developer and the Process.Start() method.  It’s not exciting, and I don’t mention Silverlight anywhere other than this sentence.
Still, you should listen.

Leaky Abstractions are the technological equivalent of a spring poking through your couch.  Until the day that you sat on that rusty, pointy springy-spring, you knew that your couch was comfy. To tell the truth, you really didn’t know, care, or care to know why.  Joel Spolsky has his own diatribe on this subject, and it’s worth reading.  Of course, he talks about software, and not couch springs, but I’m sure you will still get the point. 

Some time ago, my team noticed that a core part of our application would fail to respond, and when we restarted it, it would fail to reclaim its remoting TCP/IP port.  Specifically, it listened on port 30123, and whenever we restarted the application, we would receive a  System.Net.Sockets.SocketException claiming that the port was already in use.  From the command line, a netstat –a verified this information.  The netstat output also revealed that the “dead” process had the port still locked.  We were mystified, as netstat was reporting a Process ID (PID) that was no longer in the task list.


After much head scratching, we engaged Microsoft Product Support Services.  Unfortunately, Microsoft PSS did about as much head scratching as we did.  As this problem was only intermittent, our troubleshooting was particularly difficult.  Eventually, we were able to repro the issue in our development environments, and Microsoft escalated the issue to their internal debugging teams.  We submitted several crashdumps, and discovered some really, really interesting news:

Under certain circumstances, when you launch a process using the Microsoft System.Diagnostics.Process.Start() method, all handles from the parent process are inherited by the child process.

This may not be particularly interesting, but consider the following potential chain of events:
1. Process A starts.
2. Process A opens a TCP port.
3. Process A starts Process B using System.Diagnostics.Process.Start
4. Process A terminates unexpectedly, and does not properly close its TCP port.
5. Process B maintains a copy of the handle to the TCP port.
6. As there is an existing open handle, the Operating System does not close the port when cleaning up dead resources for process A.
It took several user generated crash dumps, but Microsoft PSS finally confirmed that this was the issue. 
How does this tie into the concept of Leaky Abstractions?  Well, if we examine the Process.Start() method using Reflector, we note the following line of code:

flag = NativeMethods.CreateProcess(  null, 
                                     cmdLine, 
                                     null, 
                                     null, 
                                     true
                                     creationFlags, 
                                     zero, 
                                     workingDirectory, 
                                     lpStartupInfo, 
                                     lpProcessInformation);

If you note the fifth parameter is hardcoded to true, and if you are familiar with the fact that the fifth parameter to CreateProcess is boolean bInheritHandles, I think you’ll understand our handle leak.


I believe that I would be hard-pressed to find a single person (besides the author of Process.Start ) who actually understands that the .NET CreateProcess method defaults the bInheritsHandles bool to true.  Nowhere in the Process class documentation is this mentioned, and even our interactions with Microsoft Support failed to pin this issue down until we resorted to crash dump analysis.


What’s the moral of the story?  I’m not really sure.  I know that I wouldn’t urge anyone to avoid the .NET Process class, but it would be nice if there were mechanisms for finding out this information short of WinDbg and Reflector.

Only 1 bottle of Mountain Dew was harmed during the creation of this article.

 Earlier this week, my twelve year old daughter approached me and asked to learn "All that computer stuff that you do."  We had chatted previously about my job, but I had taken particular care to not pressure her into learning computers. Such a decision is best left to natural curiousity.

Hanna and I talked about several languages.  She had wanted to learn C#, as that is what I spend most of my time with.  Instead, I suggested Ruby.  I've always wanted to learn the language, and an interpreted environment would make the edit-compile-debug loop a little bit easier.

Ruby has yet to disappoint me as a teaching language.  While I haven't found any Ruby books suitable for teaching a pre-teen Ruby programming skills, I have been able to develop simple lesson plans to teach basic concepts.  Along the way, I'm also learning the language, which is a big plus for me.

In two days, she has learned console input and output, variables, string comparison, arrays, simple iterators, if / else constructs, and more.  It's really been a blast.  I'd definitely recommend Ruby as a teaching language.

Next topic:  Objects.  Wish us luck!

I feel like I just got hit by the dodgeball in elementary school gym class. In a unique game of tag, I'm supposed to tell everyone five things that they didn't know about me. Well, here goes:


1. I spent 6 years in the United States Army as a Military Policeman. I served in quite a few places overseas, incuding time as a NATO peacekeeper in Bosnia.

2. I once assisted the Secret Service in guarding President Clinton. My unenviable task was keeping the press corps from standing in front of a Secret Service SUV packed with every armament imaginable. There was an agent sitting behind the wheel with one foot on the brake, and the other on the gas, with the idea that if there was a threat, he would gun the engine and drive forward in order for the rest of the agents to retrieve heavier firepower. While I'm quite sure that no-one present was overly fond of the press, no-one wanted the headline "Government Agent Creates Speedbump Out of CBS Cameraman. Film at 11." I met the President again shortly after this assignment.

3. Before I was old and slow, I ran track and cross-country in high-school. Back in the day, I used to be able to run a sub-5 minute mile, and once ran a "short" 5k in under 16 minutes. Not world record stuff, mind you, but not bad.

4. I got into Information Technology after leaving the service. I had applied for the Cobb County police department in Georgia, and was waiting for my academy date. While waiting, I applied for an internet tech support job. When my academy date rolled around, I was already a supervisor making almost twice what I would as a police trainee. I stuck with my new career, and here I am eight years later! (I still watch law & order for my cops & robbers 'fix').

5. I would like to try my hand at writing a technical book, but I always feel that it would be a bit presumptuous of me.


Now for the "tag" part - Tag: Yang Cao, Wally McClure, Rob Mensching, DonXML, and Erik Porter.

My Continuous Integration presentation in Dallas, Texas went pretty well.  Although I'm not a practiced speaker, it seems that the audience was engaged and interested.  I've even been invited to present on the same topic again to another group.

I've been asked to publish the slides.  They are oriented around my dev group's experience with CI, so they may not be as interesting to external folks.  Also, without me to explain them, it's just not the same :-)

Here's the presentation.

A few loyal readers have complained that the "Occasional Clue" has been a little less occasional than anticipated.

There was a reason, actually.  I took an unannounced hiatus from blogging.  The reasons are relatively boring, but I certainly didn't stop for lack of writing material.

I've been to several interesting Atlanta C# Meetings, including Paul Wilson's fascinating O/R Mapper talk. I went to an interesting talk by Jeffrey Richter, after which we all went to the local wings joint and downed a few cold brews.  I even met the GridViewGirl FormerlyKnownAs DataGridGirl, and didn't even know that I was in the powerful pink presence of perspicacity.

I'm in the process of evaluating several new technologies, which I will post on shortly. 

Among other efforts, I found (and fixed) a SQL Server 2000 / dblib bug. This bug potentially would cause my team hundreds of developer-hours. Even after several PSS cases, Microsoft could not give us a solution, so I'm particularly proud of this fix.

I'm also posting from the American Airlines Conference Center in Dallas, Texas, where my company is bringing about a hundred Developers, Managers, and Architects under one roof for a chance to rub elbows, and show our latest "leet skillz."  I'm presenting on Continuous Integration, and how implementing CI has improved life on my project team.  It's my first major presentation in front of a bunch of people I don't know, so I'll let you "occasional" readers know how it goes.

DonXML's Pre-emptive strike against the future.

This is really too funny.

 

One of the keynote presentations today was on Windows Compute Cluster Solution.  Now, I've been working with Windows Clusters in a High Availability environment for some time now, so I've been very interested in what Microsoft's message was going to be in this product space.  Microsoft has been getting their lunch handed to them in this area by Linux clusters for a long, long time.  While it is currently possible to build High Performance Clusters on Windows without the Compute Cluster Solution, it is certainly not straightforward.

There are a number of hurdles that Microsoft faces in this product space:

* Linux clusters are a mature solution.  Most existing clusters are built on Linux, and programmers/administrators are familiar with this environment.

* Linux is more flexible and scriptable from the command line.  While third party tools do make Windows relatively scriptable (for example, you may install a Unix-like shell on windows using Cygwin), these tools merely bolt-on pieces that are automatically available in Linux.

* Linux clusters are inheritably cheaper.  If I am building a 40- or 100-node cluster, per-server operating licensing costs become a major portion of my budget.  I'd rather spend this money on additional nodes.

* Linux clusters are easier to auto-rollout.  While Windows has RIS, and third party imaging tools like ghost, it is very trivial to autoscript installation on Linux.  I'm mostly a Windows guy, but I was able to use Intel network cards with PXE to download a bootloader over TFPT, and then auto-image an installation of Linux onto the box.  I do understand that all of these things are possible on Windows Server, but the infrastructure overhead is a bit higher.

* Linux HPC clusters may be configured for High Availability.  Windows Compute Cluster Solution is not designed for High Availability.  More on this later.
  
--

So, I was disappointed in this mornings keynote coverage of Microsoft Compute Cluster Solution.  As I'm a bit familiar with this product space, I found the demos misleading, at best.


A quick rundown of my observations:


Bob Muglia (Senior VP, Microsoft Windows Division)  was demonstrating job distribution on a Microsoft Compute cluster environment, and seamlessly added a cluster node to an existing cluster.  This was actually pretty impressive, though one would expect this functionality in a cluster solution.  Everything worked smoothly, and the new cluster node automatically took on existing jobs. 


To demonstrate that cluster nodes may be removed (fail), Bob removed the network cable to an existing node, and the head node removed that system from the list of available cluster worker nodes.

What nobody really noticed was that the failover demo failed.  I think a job was stuck, and they quickly hit the kvm switch before anyone could notice.

They then went on to demo a new feature of Excel 12, with "Excel Server".  The idea is that you may run an Excel spreadsheet on a server.  For this demo, they ran a complicated Excel spreadsheet on the Cluster.  The cool part of the demo was that you could upload the spreadsheet to the cluster, and the Scheduler would distribute this job to an available cluster node.  The node completed the work, and the spreadsheet results were returned to the client.

While this demo was cool, it totally misses the purpose of a cluster.  Running the Excel spreadsheet on a cluster demonstrates what may be called an "Embarrassingly Serial" problem.  The idea of running on a cluster is that you have a problem that may be split into many parallel subtasks.  Some problems, such as figuring Prime Numbers, are often referred to as an "Embarrassingly Parallel".  This class of problem means that no piece of the problem set is dependant on another piece, and each node may work on its job in isolation.

Jobs like running an Excel spreadsheet are really not the class of problems that an HPC cluster is intended to solve.  While the Excel spreadsheet logic may be re-written in C++ using similar algorithms to solve the same problem set, the use of an HPC cluster to run serial jobs is not the best use of this type resource.

There was another statement that really got to me, however.  In the demo, the phrase 'high availability' was mentioned.  Specifically, when allowing one of the cluster nodes to fail, we were told that Compute Clusters support high availability, as a failure of a compute cluster node does not bring down the entire system.

However, it is extremely important to understand that Microsoft Compute Cluster Solution is NOT suitable for high availability environments.  As currently designed, the head node (scheduler) is a single point of failure, and this service will NOT fail over to other nodes. So, if the head node goes down, your cluster is effectively down.


This was a really misleading demonstration, and I am disappointed with the 'sleight of hand' that was used here.

Don't be mislead by the hype - Microsoft Compute Cluster Solution may have its place in number crunching in your organization.  But don't think that it is a way of distributing out your existing applications into a clustered environment.  Its not.

Well, everyone else has been posting about the PDC - I thought that I would have written by now, but I haven't really seen anything that is really exciting.  So far, most everything seems to be a refinement of things introduced in PDC 2003. 

Oh, Office 12 was mentioned.  Since use Office as a glorified Wordpad.exe, that's really not that exciting.

I am currently sitting in the session, "High Performance Computing with Windows Server Compute Cluster Solution."  It's a slide-show fest, with very little demo or bits to see.  This session is a basic review of of Clustering, and is mostly review if you are familiar with these concepts.

There's actually very little about what microsoft is doing in this space, and more about what high performance clustering actually is.  I'm still not impressed.  Show me, don't slide-show me.

Sometimes things that everyone just knows aren't always true.

For example, when I look at c# code, I know that local variables are stored on the stack.  Or, I thought I knew that.

However, Ian Griffiths notes that in .NET 2.0, this isn't neccessarily true.  This change was made to support anonymous methods, and mutability of variables in the enclosing scope.

Time for me to 'unlearn' a few things.

 

So, over the last couple of years, I've been what I consider a 'defender' of MSDN documentation.  After all, the docs are miles ahead of what they used to be.

As a user, I've also taken to reporting doc bugs to Microsoft.  My first couple of experiences with this was pretty good.  In one case, the doc team actually rewrote an entire code example based on my feedback.  I was impressed.

Recently, however, my experience has not been as good.  Most of the doc bugs I report now get a generic response from a support specialist that is little more than a glorified secretary.  It's obvious that these support people have no idea what I am talking about.  They merely take the email, and forward it on to the appropriate team.  When they get a response, they cut and paste that response in an email.  It is frustrating, because these support people have no clue.

The other thing that I'm finding is a resistance to change. I've found a few examples where the docs were wrong, and I could prove it through experimentation.  However, the product teams weren't really interested in correcting the issue.  For example, although the documentation on debugging windows installer custom actions can be shown to be wrong, I received a 'brush off' response.

So, I'm making a new promise to myself. 

Every time I find a doc bug in a Microsoft product, I'm going to post about it.  I've tried to correct issues through appropriate channels, and it doesn't seem to work.  Instead, I'll let google juice do the work for me.

 

More Posts Next page »