Slow boot from massive registry on a Hyper-V server: Fix
At OrcsWeb we started seeing some of our Hyper-V host servers taking longer and longer to reboot for patching or other maintenance. It got to the point that one server was taking longer than an hour to start up after a reboot.
Obviously this wouldn’t do!
The Cause
From research we found that there is a known bug with Volume Shadow Copy Service (VSS) based backups—in our case from DPM—causing the registry to grow. By the time we resolved the most problematic server the registry had grown to 1/2 GB. No wonder it took so long to boot.
There are good docs from Microsoft and others, but from my working through this I found that there is some information lacking, so this blog post is meant to pull together some of the key information that helped me with Windows Server 2008 RTM and 2008 R2 servers. Notice that I said Windows Server 2008 RTM too. Some articles out there suggest that it only affects R2. That’s not the case.
The issue resulted in Windows hanging or staying on the logon Welcome screen for a long time before allowing the user to logon.
Ben Armstrong summarizes the issue well:
Whenever we backup a virtual machine using VSS, we momentarily connect the backup copy of the virtual machines virtual hard disks to the parent partition in order to clean them up after backup. Unfortunately with Windows Server 2008 R2 a new plug-and-play entry would get created in the Windows registry each time that we did this. Over time this would cause the registry to get larger and larger, which would in turn slow down the Windows boot process.
Since others, including Microsoft, have already written about it, I’ll first reference them. It’s worth reviewing them since I won’t repeat everything covered:
- MS KB Article 982210 on the issue
- Gary’s Gambit’s 3 part series on this. Great read. It’s progressive, in that the first two posts are discovery of the issue with some conclusions based on the limited information available at the time. Part III pulls it all together.
- A recent post from Gary Herbstman with links to 32-bit and 64-bit versions of DevNodeClean! More on this below.
The Fix
To fix it, there are 2 steps. The hotfix in Microsoft’s KB Article 982210 ONLY prevents the issue from continuing. It does not actually clean the existing orphan devices. Additionally, the hotfix is only for Windows Server 2008 R2.
Furthermore, I’m not 100% confident that it addresses every orphan device. On patched Windows Server 2008 R2 servers there are a “small number” of phantom devices that build over time. Small number = maybe 50 devices rather than 28K. It’s definitely not as bad as the original issue, but you may want to run devnodeclean (more on this below) from time to time even on an R2 server.
So, my recommendation is two-fold:
- Do apply Microsoft’s hotfix on Windows Server 2008 R2 (KB Article 982210)
- Run devnodeclean on all servers with Hyper-V installed.
- For Windows Server 2008 RTM, schedule it to run daily with Windows Task Scheduler.
Once you perform these two steps you should be set, and you’ll have quick reboots again.
Devnodeclean
Here’s the issue … where’s devnodeclean??? Sample C++ source is available but Microsoft hasn’t made a compiled version available publicly yet. Apparently you can call Microsoft product support and they will provide it. I discovered that later so I haven’t tried calling Microsoft support. It’s a known bug so you should be able to get support for free, or get a refund if you have to use a support ticket to first talk to them.
Since I originally worked on it some copies have sprung up across the internet. Be careful because I’m sure there are some malicious versions out there. I haven’t tested but this one from Gary at Byte Solutions looks promising. Edit: And since my original post Gary commented and made one adjustment to his version so that it’s completely xcopyable and a single EXE. I recommended considering his version.
As a side, a free tool called device remover is mentioned in the comments of this post as a solution although I didn’t have success with it. You’ll likely have better success with devnodeclean.
You have about three choices. You can use the version from Gary. In fact, that’s likely the easiest solution. You can call Microsoft product support. Or you can compile it yourself.
Since I already fought through this a couple months ago, I’ll post the steps necessary to compile the version listed here in case you want to go that direction: http://support.microsoft.com/kb/934234
First start with KB article 934234. However, there are 3 extra steps that are required for Visual Studio 2010 that aren’t listed in that article.
- First, if you’re targeting a 64-bit machine, you must compile it as a 64-bit DLL otherwise it will fail with a useless error message. You can use Visual Studio 2010 with C++ installed. Be sure to be explicit that it’s a x64 release. If you’re targeting a 32-bit machine, you must create another DLL for that environment.
- Second, when compiling you’ll get an error that you “Cannot convert from 'const char [..]' to 'LPCTSTR'”. Using the suggestion from this post I changed the Character set to “Use Multi-Byte Character Set” in the project properties.
- Third is to make sure that the dependencies are included with the project. If you perform this step then your version is completely xcopyable between servers as a single simple .exe file that doesn’t need an installer. This post pointed me in the right direction. From the project properties change to Release mode and then go to Configuration Properties –> C/C++ –> Code Generation and change RunTime Library from Multi-threaded DLL (/MD) to Multi-threaded (/MT)
- Once you compile it you’ll have yourself a clean executable.
To run devnodeclean (aka Cleanup) I prefer to run it from the command line. If you call it without any parameters it will find and remove all of the phantom devices. It’s very fast and you’ll immediately see feedback as it works.
Disclaimer: Microsoft hasn’t released this yet, probably for some reason. They must not feel that it has gone though adequate testing. So only run this if you need to and understand that there is some risk. That said, I’ve run this on about a dozen 2008 RTM and 2008 R2 servers for 2 months now and it has worked beautifully.
Summary
Let me summarize for those that don’t want to read the whole post or follow the links.
During backups VSS will create temporary registry entries for devices drivers that aren’t properly deleted afterward. If you back up your Hyper-V system with something like DPM, set to have hourly snapshots, and with lots of virtual machines, this can cause tens of thousands of stale device references in the registry.
There is a fix available for Windows Server 2008 R2, but it doesn’t clean up the phantom devices already created. So, you must:
- Apply the hotfix found here: KB Article 982210 (Windows Server 2008 R2 only)
- Then run devnodeclean to clean up existing phantom devices. Additionally, on Windows Server 2008 RTM, schedule devnodeclean with Windows Task Scheduler to run daily. You can obtain devnodeclean from one of the following 3 methods:
- Call Microsoft product support services and reference KB Article 982210.
- Use the version made available from Gary Herbstman at Byte Solutions.
- If for some reason those don’t work out, you can follow the instructions in my post above on how to compile your own version.
I hope this helps. Happy fast rebooting!