SQL Server Assertion: MSKB 885290

This one hit my team pretty hard.

FIX: An assertion error occurs when you insert data in the same row in a table by using multiple connections to an instance of SQL Server

We had a customer with a major production issue - The SQL Server was throwing stack traces at random intervals, and the SQL connection would die.  We traced the issue to a specific stored procedure, and from there, to a specific line of T-SQL.  The problem was, the T-SQL was perfectly legal, and valid.  I searched the newsgroups, and http://support.microsoft.com, but had no luck.  So, next stop was Microsoft's Product Support Team for SQL Server.

It wasn't the dream experience I had hoped for.  I've dealt with the MSMQ PSS team on several occasions, and it's always been a pleasure.  However, there were several turn-offs when dealing with SQL Server PSS.

1.  I didn't deal with a Microsoft employee.  He was a contractor (as noted by the 'v-' prefix to the email address.  Note to MS: You give away too much business information in your email to external people).  Frankly, I don't even think that he worked directly for the Microsoft.  Every conference call I had gave me the distinct impression that he was sitting in another office, let's call it Company X.

2.  It was apparent that he didn't have a direct communications channel with the SQL Server engineers.  This is a real problem, as the issue was a SQL Server bug, not a problem with our code.  T-SQL should NEVER generate a stack trace, and yet the original approach to the issue seemed to be to analyze the T-SQL to see if we were doing something wrong.  We weren't.

3.  Troubleshooting the problem required generating two SQL Server memory dumps, which bring down the SQL Server for up to twenty minutes.  This is a major issue, as this server runs at a major hospital, and bringing it down directly impacts patient care.

4.  After generating the memory dump, PSS assured us that there was a released hotfix for the issue.  It was actually for an unrelated issue.

5.  Testing revealed that the hotfix didn't fix the issue.  We complained, and microsoft released a new hotfix. 

6.  This is the part that makes me really upset, so let me pause for a second.  

Ok, I'm ready.

If you look at the example stack trace in the KB article, you see that it is from 2001 !!!!  That means that they've known about this issue for three years!  I understand not fixing it for three years, but the support team wasn't even aware of the bug.  It still took three months for them to troubleshoot the identical issue on our systems.  That really shows me that there was a lack of communication between the PSS team and the Engineering Team.  This is really disappointing, 

I'd go on, but I'm kind of upset.  This bug has caused me much unrest, and in the end, I am certainly glad that we have a resolution.

2 Comments

Comments have been disabled for this content.