www.asp.net, update part 2

Last week we did an update of www.asp.net and had some significant problems with the main www.asp.net site. I summarized some of this here.

We originally thought the problem - what appeared as a memory leak - was due to a search component we were using. Under load the application would run the CPU at 100% and eat 2-3 MB of memory per-second. This would continue until the memory reach about 700 MB and then the application would attempt to recycle and start the whole cycle over again.

We're now fairly confident we found the bug, it's a bug which is only obvious under significant load. Simulating the type and pattern of load on a site as highly trafficked as www.asp.net is not as simple as it sounds. A few people questioned whether or not we even tested the site before we put it into production, which of course we had.

The bug was a in a couple of lines of code for a custom component on www.asp.net. On a positive note this was not Community Server code or Lucene (search code), but custom code written specifically for the CMS system that drives www.asp.net. I won't go into the specifics, but the bug had to do with loading and parsing an XML document.

We're definitely very sorry about the down time this caused, but we still want to keep everyone up-to-date with what we've found out.

Published Monday, May 21, 2007 1:45 PM by Rob Howard

Comments

# re: www.asp.net, update part 2

Monday, May 21, 2007 3:19 PM by aa

An infinite loop while parsing an XML document? LOL!!!

# re: www.asp.net, update part 2

Monday, May 21, 2007 3:26 PM by Jon Galloway

Thanks to you and your team. I think a lot of your users don't appreciate the volume of data and number of moving parts you're managing to keep the whole *.asp.net system working day after day.

I've previously suggested a separate blog for community announcements. I really think that would be more effective than e-mail, and of course the combination would be best. The more "self-serve" you can make things, the better - then when bloggers don't know about an outage, you can very politely point them at the public announcement(s) you've made. To be effective, the announcements would need to be included in the main feed and shown in a side box on the main ASP.NET page.

# re: www.asp.net, update part 2

Monday, May 21, 2007 4:02 PM by Nic Wise

oooh, can I guess? Did someone load a large XML document into an XmlDocument class?

Like, maybe one with a 10meg base64 encoded block in it?

Or is it just us who does that? (tho not anymore :) )

# re: www.asp.net, update part 2

Monday, May 21, 2007 4:28 PM by help.net

Rob thanks for the update. By the way I am still waiting an answer about our last email conversation.

What do you think?

Cheers

Paschal

# » www.asp.net, update part 2

Monday, May 21, 2007 5:00 PM by » www.asp.net, update part 2

Pingback from  » www.asp.net, update part 2

# re: www.asp.net, update part 2

Monday, May 21, 2007 5:07 PM by Ken Cox [MVP]

Thanks for reporting back to the community on your findings. It would be interesting to hear more on how you tracked down the memory leak.

# re: www.asp.net, update part 2

Monday, May 21, 2007 8:09 PM by Vikram

Its great to hear that the bug was resolved. But I would like to know the technical reason or problem you faced with XML document so that we do not make these mistake again.

But congrats on getting the thing done.

http://www.vikramlakhotia.com

# re: www.asp.net, update part 2

Monday, May 21, 2007 8:55 PM by Mark Wisecarver

Impressive bro, thanks.

Salute, with honors. ;-)

# Load testing updates

Monday, May 21, 2007 10:19 PM by Community Blogs

I just wanted to throw my two cents in on this subject of load testing updates to highly trafficed sites

# re: www.asp.net, update part 2

Monday, May 21, 2007 11:54 PM by ScottW

@Vikram: This does not look like an issue with actual XmlDocument class (or anything else in System.Xml).

Instead there was a block of code which was constantly re-loading xml files on the same request (probably 100's of times). At moderate load this was not an issue, but under a lot of pressure we seemed to hit a tipping point which pushed things over the edge.

Thanks,

Scott

# re: www.asp.net, update part 2

Wednesday, May 23, 2007 3:50 PM by sridhar

Thanks for sharing, but mighty Rob howard/telligent seems fallible

# Load testing updates - Rob, I feel your pain

Thursday, May 31, 2007 5:28 PM by wallym

I just wanted to throw my two cents in on this subject of load testing updates to highly trafficed sites. ...

# re: www.asp.net, update part 2

Saturday, June 02, 2007 10:14 PM by Mitch Wheat

Hi Rob, I'd be interested in hearing how you tracked this bug down. Or was it just a D'oh ! moment?