Scott Forsyth's WebLog

Postings on IIS, ASP.NET, SQL Server, Webfarms and general system admin.

June 2006 - Posts

TechEd 2006 - Recap

I just arrived home from TechEd 2006 where I gave a breakout session on IIS6: Managing Effective Webfarms.  I have to say this was probably the best technical conference that I've attended.  I was scheduled to help out in the DEV and IIS7 booths and during that time I had plenty of time to discuss the inner workings with many of the IIS Program Managers and Developers that attended.  They are always enjoyable to talk to because of their intimate knowledge of the subject. 

Through the many social events during the day and in the evenings I was able to hang out with many people I knew already and many I met for the first time.  I spent a lot of valuable time with a number of people I respect in the industry.  Probably my favorite was when taking the elevator on the Friday morning, Mark Russinovich got on.  I had met Mark about 3 years ago at one of his seminars that I attended in Redmond.  I don't think he remembered me, but he started a conversation without missing a beat.  We shared a taxi to the convention center while chatting about debugging, his next tools that he will be releasing, speaking engagements, ORCS Web and other things.   For those that don't know, Mark is the brains and primary developer behind Sysinternals (www.sysinternals.com) and the person who discovered the famous Sony root kits.  He's a Windows internals guru, very possibly the top person in that area in the industry.  He's always one of the top rated speakers at events like this and had the top rated session again at TechEd 2006.

Expo '06 was great with dozens of booths of vendors showing off their products and services.  I came back with an extra backpack full of giveaways, balls, pens and gadgets, and a head full of new information and ideas. 

I was pretty happy with my talk and the attendee evaluations were good.  I did three demos during my session, and by the end I had set up a redundant webfarm using 2 web nodes and a primary and backup remote content location.  I forced one server to handle both the role of the primary web server and content server.  Then I pulled the plug (virtually) on that server so that it failed.  Without any downtime, and with only about a 20 second delay, both the web and content roles switched over to their respective backup servers.  It was a fun demo and seeing a successful recovery of a simulated failure was the highlight of the talk.  Three articles and tools that I prepared for the talk can be found here:
http://www.orcsweb.com/articles/iiscnfg.aspx
http://www.orcsweb.com/articles/dfsresources.aspx
www.orcsweb.com/articles/aspnetmachinekey.aspx

Back to IIS7, here are a number of exciting changes. 

  • Tracing is probably the top of my list.  Being able to get detailed trace reports in IIS7 is not only more complete than the current tools, but it's quick and easy to do.  You can set a trace on a particular file or all files, and for any HTTP status codes.  The output is saved directly to an easy-to-read XML file.
  • Delegation is the next on my list.  Developers will love this one.  Now most IIS tasks, like setting the default documents or creating custom script mappings, can be done from web.config directly by the developer.  This makes XCopying of the site possible, and means there is less dependency on the server administrator for straight forward tasks like this.
  • My third favorite is how the IIS7 pipeline is handled.  In the past ASP.NET has an ISAPI add-on.  Now it's integrated right into IIS7.  This means that forms or windows authorization will apply to static files, images and all other non-ASP.NET pages.  In the past if you used ASP.NET authorization, it only affected files specifically handled by the aspnet_isapi.dll ISAPI extension which included extensions like .aspx, .asmx, but not .asp, .html, .jpg or many others.
  • Next on my list of favorite features is the modularization of IIS7.  In IIS6, features like authentication, authorization, default document handing were part of the core server.  Now they are broken out into separate modules that can be disabled or even completely removed.  This allows IIS administrators to remove any functionality that they don't use; giving a smaller footprint which is better for security and also means less memory is used.  This modular structure also makes extending or replacing existing modules easy and powerful. 
  • The user interface is completely redone.  For me it means a lot of relearning since I'm comfortable with the IIS6 interface, but it's a nice looking tool that I'm sure I'll get used to before too long.
  • Programming against IIS7 is easier and more powerful than ever before.  The WMI class is much richer and cleaner and there is also a .NET namespace to directly manage IIS7.
  • There are plenty of other great changes, but those are the ones that stand out to me.

Enough said.  I had a great week and for those trying to decide which event to attend each year, I recommend TechEd as a top pick.

IISCnfg.vbs - IIS Settings Replication

IISCnfg.vbs - IIS Settings Replication

Microsoft provides a tool called IISCnfg for management of the Internet Information Services (IIS) Settings. One of the features that this includes is the ability to replicate the IIS settings from one server to another. This is useful in a webfarm environment where you require all web servers to be in sync.

IISCnfg.vbs is automatically placed in the %windir%/System32 folder if IIS6 is installed and is a command line tool to be run directly or through a batch file.

I've been using IISCnfg for a few years on a number of different webfarms and have worked through a few issues with it that I will share here. I am also making a few scripts available that extend and help better utilize the IISCnfg file. IISCnfg has a fair bit of flexibility but it has a couple shortcomings to become aware of and overcome.

IISCnfg.vbs has 4 main commands: save, export, import and copy. I'll summarize them here:

Save
This simply saves a copy of the IIS metabase as if you were saving from the IIS Management tool. It will save it to the %windir%/system32/inetsrv/MetaBack folder and you will see it as an available backup to be restored from the IIS Management tool.

This simply saves a copy of the IIS metabase as if you were saving from the IIS Management tool. It will save it to the %windir%/system32/inetsrv/MetaBack folder and you will see it as an available backup to be restored from the IIS Management tool.

Export
This allows you to export the entire IIS metabase, or parts of it, to a file that you specify. If you specify a password when exporting, the machine specific information will be stripped out and you can restore this on a different server.

This allows you to export the entire IIS metabase, or parts of it, to a file that you specify. If you specify a password when exporting, the machine specific information will be stripped out and you can restore this on a different server.

Import
This is the opposite of Export and allows you to import a valid file back into the Metabase.

This is the opposite of Export and allows you to import a valid file back into the Metabase.

Copy
A Copy uses a combination of Export and Import to copy the IIS settings from one server to another. It will export the metabase to disk, map a network drive to a target server, copy the file over and use Import on the other end to load the metabase into the target server.

A Copy uses a combination of Export and Import to copy the IIS settings from one server to another. It will export the metabase to disk, map a network drive to a target server, copy the file over and use Import on the other end to load the metabase into the target server.

For metabase replication, I use two terms to describe the features that I want to utilize:

Push
When I mention a metabase push, I'm referring to the /Copy feature of IISCnfg. This is easy to set up and run, but the drawback is that there is downtime on the target server while the import is run. In a production environment that is often not acceptable unless you first disable the web server from the load balancer so that no traffic goes to the target server during the push.

When I mention a metabase push, I'm referring to the /Copy feature of IISCnfg. This is easy to set up and run, but the drawback is that there is downtime on the target server while the import is run. In a production environment that is often not acceptable unless you first disable the web server from the load balancer so that no traffic goes to the target server during the push.

Merge
A Merge on the other hand is not a built-in feature of IISCnfg and needs to be handled manually. It will not do the export, map the network drive and import on the other end like the /Copy feature does. MetabaseMerge.bat in the included .zip file takes care of this for you.

A Merge on the other hand is not a built-in feature of IISCnfg and needs to be handled manually. It will not do the export, map the network drive and import on the other end like the /Copy feature does. MetabaseMerge.bat in the included .zip file takes care of this for you.

The advantage of the Merge is that there is no downtime on the remote servers. This makes it a better solution for day to day changes on a webfarm. The thing to be aware of is that it only adds or updates the data and doesn't take care of deletes. So if you delete a site or an application pool or remove a folder as an application, those won't carry over to the target server using Merge.

Push / Merge Comparison

PUSH

MERGE

Replicates everything including deletes

Deletes not replicated

Downtime on target server during Push

No downtime or slowness during Merge

/copy command takes care of all the hard work

/merge is a property of import and requires you to manually take care of what /copy does automatically

 

Because there are a different set of advantages and disadvantages of each method, I use a Merge whenever possible, but if there are a lot of changes that involve deletes, I will do a staggered replication push while coordinating with the load balancer to take nodes offline during the Push.

Metabase ACL Issue

The IIS Metabase has permission settings (ACLs) on various nodes of the metabase and you can lock down or loosen the permissions using the command line tool metaacl.vbs or using Metabase Explorer included in the IIS Resource kit. This is similar in concept to file system permissions on an NTFS disk volume. Normally this is something that doesn't need to be changed except for very specific requirements, but the push doesn't handle the ACLs perfectly, requiring us to dig deeper. The best I can tell, this is an oversight in the export or import function built into IIS. The IIS_WPG group, which is commonly used, needs to be switched from a node specific SID of the source server to the SID of the target server.

The copy/import works properly in the root nodes of the metabase and some other nodes that have specific permissions, but there are two sections of the metabase that aren't handled correctly. They are /w3svc/AppPools and /w3svc/Filters.

It’s easier for me to explain by showing the output of the metaacl.vbs tool. Here is an example of an untouched metabase:

C:\admin>Metaacl.vbs "IIS://localhost/w3svc/apppools"
BUILTIN\Administrators
Access: RWSUED
NLB1\IIS_WPG
Access: U
NT AUTHORITY\NETWORK SERVICE
Access: U
NT AUTHORITY\LOCAL SERVICE
Access: U

Notice the NLB1\IIS_WPG with U permissions. But now let's look at the same node after a push.

C:\admin>Metaacl.vbs "IIS://localhost/w3svc/apppools"
BUILTIN\Administrators
Access: RWSUED
S-1-5-21-2936230025-297186120-535571621-1007
Access: U
NT AUTHORITY\NETWORK SERVICE
Access: U
NT AUTHORITY\LOCAL SERVICE
Access: U

Notice where the IIS_WPG group used to exist; now there is just an invalid SID. This won't cause your sites to stop, but it will cause issues with IIS being unable to read the private memory limits of the app pool and some other similar issues. So, when doing a push, make sure to clean that up. The attached scripts take care of that as well.

When doing a Merge, I found that the easiest way to handle this is to remove the ACL lines completely and retain the permissions from the destination server. So I have another script called RemoveAdminACLline.vbs that takes care of that. That makes the /merge even cleaner yet because it doesn't touch the ACLs on the destination server.

Removing the Password Requirement - IISCnfg2.vbs

The other default behavior with IISCnfg that I have adjusted is the unnecessary requirement to put in the username and password of the target server. IISCnfig makes these required fields, so if you leave them off, the /copy command will not work. But, if the user identity running the script has permission to the target server, it should be able to pass that through to the target server without requiring you to put the username and password in the batch file. If you need to put the username/password in the batch file, that means that you need to maintain it over time, and it also means that you have a username/password in plain text in your batch files or scripts. Fortunately it was easy to comment out the requirement to do that and it works perfectly without it. So the attached files include IISCnfg2.vbs which has that modification. I made a comment with my name (Scott Forsyth) beside each line that I changed so that you can tell what was changed.

Included Files

With the theory behind us, let's take a look at the files included in the download file. Note that I did not try to pretty it up or put a lot of error checking in it. It's simple and easy to understand, but that also means that you may need to do some digging into the files to make any changes or to troubleshoot any issues that I didn't handle in a friendly way.

[The first two files are the ones that you will use to do a Metabase Merge or Push.]

MetabasePush.bat

Usage: MetabasePush {ServerIP} [SMTP FQDN]

You can call this from a batch file that can live somewhere else, like on the desktop of your primary web server. It will do a metabase push from the current server (localhost) to the target server specified by the ServerIP.

You can optionally add the SMTP FQDN and it will update that on the target server. We use that at ORCS Web because each server node has a different SMTP FQDN.

MetabaseMerge.bat

Usage: MetabaseMerge {ServerIP} [SMTP FQDN]

This operates the same as MetabasePush except that it does a Merge instead. The ServerIP is required and the SMTP FQDN is optional. Don't forget the differences mentioned above between a Push and Merge. This file calls RemoveAdminACLLine.vbs so that the target server retains its own security settings.

[The following files are helper files and don't need to be run directly.]

IIsCnfg2.vbs

This is a copy (confirmed up to date May 2006) of IIsCnfg.vbs but with two minor modifications to make the target username and password optional.

Metaacl.vbs

This is an untouched Microsoft script to view and update the metabase ACLs permissions.

RemoveAdminACLline.vbs

This is used by MetabaseMerge.bat to remove the ACL lines so that the permissions on the target server are left untouched during a Merge.

ChangeSMTP-FQDN.vbs

This will change the fully qualified domain name of the SMTP server on the target server that you specify. It is used when there are different SMTP DNS names on each server.

readme.txt

I'm sure you can guess.

To utilize these scripts, simply extract the set of files to a folder on your server and then call MetabasePush.bat and MetabaseMerge.bat from your own batch files. For example, you could create a batch file on the desktop of your primary web server that has the following in it:

--Metabase Merge.bat--
d:\admin\ClusterUtils\MetabaseMerge.bat 192.168.2.11 web2.domain.com
d:\admin\ClusterUtils\MetabaseMerge.bat 192.168.2.12 web3.domain.com
d:\admin\ClusterUtils\MetabaseMerge.bat 192.168.2.13 web4.domain.com
d:\admin\ClusterUtils\MetabaseMerge.bat 192.168.2.14 web5.domain.com
d:\admin\ClusterUtils\MetabaseMerge.bat 192.168.2.15 web6.domain.com

If you double click on that file, it will do a Merge to the various servers on a webfarm. This assumes, of course, that you placed the files in d:\admin\ClusterUtils\.

Download: http://www.orcsweb.com/articles/ClusterUtils.zip

That’s it! I hope that you have found this useful and are able to utilize parts or all of it in your environment.

Posted: Jun 07 2006, 08:14 AM by OWScott | with 3 comment(s) |
Filed under: ,
DFS for Webfarm Usage - Content Replication and Failover

 

Windows Distributed File System (DFS) has been around for a long time and it has always had a lot to offer. With the latest update in Windows Server 2003 R2, DFS has become quite an impressive product.

At ORCS Web, we've recently started to use DFS for some of our high availability offerings that use a central NAS (Network Attached Storage) content server. We're using DFS for handling the content server, both for replication and for automatic failover to a backup server in the event of maintenance or a server failure.

There were a number of things that I learned while researching, testing and rolling out DFS for webfarm content hosting that I'll share here. This isn't a step by step walkthrough, but rather some pointers that you will hopefully find useful.

DFS has many usages ranging from keeping content in sync between different physical sites, to giving a single easy-to-remember path that can serve up content from a variety of folders across a local or wide area network. (thus the 'distributed' in DFS).

DFS in its simplest form is a way to have a single friendly UNC path on your network which can have folders distributed across multiple servers. This friendly UNC path will be permanent while the real folders that it accesses behind the scenes can be most anywhere. Subfolders can point to completely different locations on disk or to different servers on your network. This flexibility is great for our webfarm situation and allows a primary and at least one backup server to handle the content with a clean failover solution in the event that the primary server fails.

Installation

The installation is fairly straight forward once you understand the concepts. Partial DFS functionality is already installed on Windows Server 2003. The replication side of things needs to be installed separately. As long as you’ve upgraded to Windows Server 2003 R2 you can install this from Add/Remove programs and the Distributed File System category. I recommend installing all 3 optional features as the extra management tools are better for managing your redundant DFS system. This needs to be installed on the servers hosting the namespaces and the folder targets if you will use replication.

The extra replication features of R2 do require Active Directory changes. If you have already upgraded your domain controllers to R2, then no additional action is required. If you haven't upgraded your domain controller to R2, no worries, you aren't required to do so, but you do need to extend the schema. Here is a link on how to do that:
http://technet2.microsoft.com/WindowsServer/en/Library/84445c1b-a418-4a09-a50c-5f3258cfc5b51033.mspx?mfr=true.

Like anything of this nature, make sure to have a good disaster recovery plan in place and do this at a non-peak time. But the schema installation is straight forward and doesn't cause any interruption of service in Active Directory.

Once installed, there are three hotfixes that should be installed:
http://www.microsoft.com/windowsserver2003/technologies/storage/dfs/hotfixes.mspx. One is required for the client failback feature to fail back to the primary content server when it's back online after a failure, another allows you to have multiple domain-based DFS namespaces on Windows Server 2003 Standard Edition if you desire, and the 3rd supposedly fixes a potential RPC issue with replication, although I didn't run into this issue. KB Article 898900 needs to be installed on all of the servers accessing DFS (the web nodes). The other two need to be installed on the DFS content servers.

Configuration

You have two graphical tools to use at this point, both support most features. My preference is the DFS Management tool which is available after the Add/Remove programs step above. You'll find this in Administrative Tools.

There are 3 terms/levels to take note of: Namespace, Folder and Target Folder. These terminologies changed with R2 so don’t get confused with terms you used in the past.

Top Level - Namespace
A namespace is a container to hold the folder and replication settings. The path to the namespace might be something like \\Domain\Webfarm. You can have multiple namespaces per server.

A namespace is a container to hold the folder and replication settings. The path to the namespace might be something like \\Domain\Webfarm. You can have multiple namespaces per server.

Second Level - Folder
A folder is a virtual DFS folder which can have one or more target folders. The name of the folder is what is used in the UNC path. For example \\Domain\Webfarm\Site1, where Site1 is the Folder.

A folder is a virtual DFS folder which can have one or more target folders. The name of the folder is what is used in the UNC path. For example \\Domain\Webfarm\Site1, where Site1 is the Folder.

Third Level - Folder Target
A folder target is the real location of the content. This path is masked though and not seen in the DFS UNC path.

A folder target is the real location of the content. This path is masked though and not seen in the DFS UNC path.

You can have multiple target folders which point to different physical locations. There are various options to determine which target folder is used, but in our case we want to always point to a primary content server and only fail over to the backup content server when the primary server is unavailable.

Active Directory comes into play too with domain-based namespaces but management is still done from DFS Management.

Redundancy

Here's where it gets fun. To have everything fully redundant in the event that a server fails, every part of this needs to be mirrored. I'll discuss the various levels of redundancy here.

Namespace

The namespace server holds the metadata for the namespace. Be sure that this doesn't depend on a single server. The data stored here is often pretty small unless you have hundreds or thousands of folders in the namespace, so a dedicated server isn't necessarily required for this role as long as the namespace server can always respond quickly to any queries. The namespace servers can be the same servers as your content if you want.

To create a mirrored copy of the namespace, in the DFS Management tool, right-click on the Namespace and click on "Add Namespace Computer". Here you can point to an existing share on a different server or create a new share.

Folder Target

DFS masks which server is used for the folder target. To fully use DFS in this situation, you will need to point to multiple folder targets. In my situation, I want to have one server always used as long as it's available. I don't want to hit a random server because there could be data integrity issues. DFS replication is good, but it doesn't handle data locking or data write-through. This means that there could be a delay from when something is written on disk until it has replicated to all other servers. For that reason, I only want to fail over when absolutely necessary.

To achieve this there are a few things that are necessary.

  • The failback hotfix mentioned above needs to be installed.
  • All webfarm nodes need to be running Windows Server 2003 SP1 or later
  • The caching duration for the folders need to be changed. The default is 1800 seconds (30 minutes) which is too long for our situation. That means that less requests are made to the namespace folder, but it also means that the failback could take up to 30 minutes after the primary server is back online. You can update this by right-clicking on the folder in "DFS Management", going to properties and then the Referrals tab. Make sure to do this on each new folder. You can also change the cache duration on the namespace, but the default is already 300 seconds (5 minutes).
  • In the Referrals tab of the namespace properties, check the "Clients fail back to preferred targets" checkbox.
  • In the Referrals tab of the folder properties, check the "Clients fail back to preferred targets" checkbox.
  • On the properties of the primary folder target, in the Advanced tab, enable "Override referral ordering" and select "First among all targets"
  • On the properties of the backup folder targets, in the Advanced tab, enable "Override referral ordering" and select "Last among all targets"

Now you have a primary/backup server configuration that will always use the primary server as long as it is available.

Active Directory

The Active Directory part of things is done automatically and apart from the steps mentioned already, doesn't need any extra configuration. Just be sure to have redundant domain controllers in your Active Directory environment.

Links and Paths

There is a growing list of links and paths that can be used to testing purposes. Let me summarize them here assuming that the folder is called Site1 and the Folder Targets are also given the same name.

Using the DFS path directly: (DFS level)
   \\domain\webfarm\Site1

Accessing directly using the first namespace server: (namespace level)
   \\namespaceserver1\webfarm\Site1

Accessing directly using the second namespace server: (namespace level)
   \\namespaceserver2\webfarm\Site1

Accessing content directly on primary server without using DFS: (folder target level)
   \\contentserver1\Site1

Accessing content directly on second server without using DFS: (folder target level)
   \\contentserver2\Site1

Notice that it’s the DFS path (\\domain\webfarm\Site1) which will be used on the web servers and for most usages. It will always be the same, regardless of the namespace or target folder changes over time. The other paths are for testing and troubleshooting and could change over time.

Content Replication

With R2, DFS replication uses what is called Remote Differential Compression (RDC) which will only update changes to files and won't send the entire file across the wire. This is especially handy when replicating across a wide area network, but it's also good for this situation.

If you set up two or more folder targets using DFS Management, the wizard should have asked you if you want to set up replication, but if you did things in a different order, you can set it up manually after the fact. This can be done using the DFS Management tool as well.

Changes to the servers aren't immediate so DFS doesn't work well for transactional type data where both servers need to be 100% in sync within a couple seconds of each other. But for a website related situation that is mostly read intensive, DFS works great.

You have a few options but in our situation we'll use the Full mesh which means that any server will write to any other server. This means that in a failure situation, the content changes made on the backup server will push back to the primary server when it is online again.

How Good Is It?

DFS failovers are pretty impressive. If the primary content server becomes unavailable, DFS will fail over to the backup content server in a small number of seconds. In this webfarm situation, almost every time that the primary server fails, the HTTP protocol will retry for a few seconds until IIS is able to serve up a successful page.

This means that there is zero downtime if the primary content server fails. The only issue I ran into in testing is if the page load was 1/2 done when the primary server failed using master pages or web controls. It could potentially process 1/2 of an ASP.NET page and fail processing the rest. But this is pretty rare and I would say that the failover is as close to perfect as can be.

A failure of the namespace server is even smoother, resulting in no noticeable downtime or slowness.

File Change Notification in ASP.NET

There is one thing to keep in mind during a failover and failback situation. ASP.NET and IIS uses what is called File Change Notification (FCN) to let IIS know of any changes to files. For example, if you add a new .dll to your /bin folder, ASP.NET will recycle the AppDomain and reload and recompile some of the site. During a failure, although the switchover is smooth, it does take a few seconds, which is abrupt enough for IIS and ASP.NET to reestablish the File Change Notification handle using the different content server.

The issue comes with the failback. The failback is so smooth that the File Change Notification isn't updated back to the restored server. This means that if you make any changes to ASP.NET files on the restored content server, the changes aren't noticed by IIS and ASP.NET. Even deleting the entire /bin folder won't be recognized by ASP.NET if the site was visited and cached while running on the backup server. Static pages don't have this issue, but the caching in ASP.NET makes this a problem. At the time of this writing, I'm working with Microsoft Product Support Services (PSS) to try to find a good solution for this. To resolve it, simply recycle the app pool of the site(s) and it will start to function normally again. So, this isn't necessarily a show-stopper but it is something to keep in mind with the failover/failback.

Caching and DFS

DFS client computers (webfarm nodes in this case) cache the DFS information for the length of time that you specify, as I mentioned already. This shouldn't be too low or you will have too much traffic to the Namespace server, but it shouldn't be too high or changes to the namespace and failbacks to a restored server will take a long time to be noticed. It is up to your environment what you want to set this at, but in every situation, it's important to know that there is some caching that takes place.

Make sure to keep in mind that adding a new folder to your DFS namespace won't be noticed immediately. You can force the DFS client cache to be cleared by running dfsutil /PktFlush from the client server. dfsutil.exe is a tool that is available in the Windows Server 2003 /support/tools folder of the installation CD. I simply copy that file to C:\Windows\System32 and I can run dfsutil from the command prompt.

When setting up new sites, make sure to wait until the new site has been recognized by all of the webfarm nodes, or force a cache flush from all of the nodes before attempting to set up or update the site.

Backups of the Namespace

Make sure to make regular backups of your Namespace. This can be done easily using DFSUtil. Simply export to an .xml file on a regular basis and have your backup process back up that file. An example of the syntax needed is:

dfsutil /root:\\OW\webfarm /export:c:\NameSpaceBackups\DateToday\webfarmroot.xml

I did run into something when importing the namespace. I received the following error:

System error 1168 has occurred.
Element not found.

After some research and stumbling through it, I found out that I was using the domain name 'orcsweb.com' instead of NetBIOS name 'OW' in the UNC path, which the import didn't like. OW is used by DFS in this case. The export worked with either name, but the import only worked with \\OW\ which is what was in the exported XML file.

Links and Resources

Here are a number of resources that I've found helpful:

Microsoft DFS Landing page
http://www.microsoft.com/windowsserver2003/technologies/storage/dfs/default.mspx

DFS hotfixes, post R2
http://www.microsoft.com/windowsserver2003/technologies/storage/dfs/hotfixes.mspx

DFSUtil Examples:
http://technet2.microsoft.com/WindowsServer/en/Library/28be5bc5-694d-49ea-981e-34bdadd1a9311033.mspx?mfr=true

Whitepaper on designing Distributed File Systems
http://technet2.microsoft.com/WindowsServer/en/Library/1aa249c0-40f3-4974-b67f-e650b602415e1033.mspx?mfr=true

There is a lot to consider with DFS and I've only scratched the surface, but I hope that this has been helpful to cover a few common configuration settings that are required for configuring DFS on Windows Server 2003 R2 in a webfarm situation.

At ORCS Web, we've recently started to use DFS for some of our high availability offerings that use a central NAS (Network Attached Storage) content server. We're using DFS for handling the content server, both for replication and for automatic failover to a backup server in the event of maintenance or a server failure.

There were a number of things that I learned while researching, testing and rolling out DFS for webfarm content hosting that I'll share here. This isn't a step by step walkthrough, but rather some pointers that you will hopefully find useful.

DFS has many usages ranging from keeping content in sync between different physical sites, to giving a single easy-to-remember path that can serve up content from a variety of folders across a local or wide area network. (thus the 'distributed' in DFS).

DFS in its simplest form is a way to have a single friendly UNC path on your network which can have folders distributed across multiple servers. This friendly UNC path will be permanent while the real folders that it accesses behind the scenes can be most anywhere. Subfolders can point to completely different locations on disk or to different servers on your network. This flexibility is great for our webfarm situation and allows a primary and at least one backup server to handle the content with a clean failover solution in the event that the primary server fails.

Installation

The installation is fairly straight forward once you understand the concepts. Partial DFS functionality is already installed on Windows Server 2003. The replication side of things needs to be installed separately. As long as you’ve upgraded to Windows Server 2003 R2 you can install this from Add/Remove programs and the Distributed File System category. I recommend installing all 3 optional features as the extra management tools are better for managing your redundant DFS system. This needs to be installed on the servers hosting the namespaces and the folder targets if you will use replication.

The extra replication features of R2 do require Active Directory changes. If you have already upgraded your domain controllers to R2, then no additional action is required. If you haven't upgraded your domain controller to R2, no worries, you aren't required to do so, but you do need to extend the schema. Here is a link on how to do that:
http://technet2.microsoft.com/WindowsServer/en/Library/84445c1b-a418-4a09-a50c-5f3258cfc5b51033.mspx?mfr=true.

Like anything of this nature, make sure to have a good disaster recovery plan in place and do this at a non-peak time. But the schema installation is straight forward and doesn't cause any interruption of service in Active Directory.

Once installed, there are three hotfixes that should be installed:
http://www.microsoft.com/windowsserver2003/technologies/storage/dfs/hotfixes.mspx. One is required for the client failback feature to fail back to the primary content server when it's back online after a failure, another allows you to have multiple domain-based DFS namespaces on Windows Server 2003 Standard Edition if you desire, and the 3rd supposedly fixes a potential RPC issue with replication, although I didn't run into this issue. KB Article 898900 needs to be installed on all of the servers accessing DFS (the web nodes). The other two need to be installed on the DFS content servers.

Configuration

You have two graphical tools to use at this point, both support most features. My preference is the DFS Management tool which is available after the Add/Remove programs step above. You'll find this in Administrative Tools.

There are 3 terms/levels to take note of: Namespace, Folder and Target Folder. These terminologies changed with R2 so don’t get confused with terms you used in the past.

Top Level - Namespace
A namespace is a container to hold the folder and replication settings. The path to the namespace might be something like \\Domain\Webfarm. You can have multiple namespaces per server.

A namespace is a container to hold the folder and replication settings. The path to the namespace might be something like \\Domain\Webfarm. You can have multiple namespaces per server.

Second Level - Folder
A folder is a virtual DFS folder which can have one or more target folders. The name of the folder is what is used in the UNC path. For example \\Domain\Webfarm\Site1, where Site1 is the Folder.

A folder is a virtual DFS folder which can have one or more target folders. The name of the folder is what is used in the UNC path. For example \\Domain\Webfarm\Site1, where Site1 is the Folder.

Third Level - Folder Target
A folder target is the real location of the content. This path is masked though and not seen in the DFS UNC path.

A folder target is the real location of the content. This path is masked though and not seen in the DFS UNC path.

You can have multiple target folders which point to different physical locations. There are various options to determine which target folder is used, but in our case we want to always point to a primary content server and only fail over to the backup content server when the primary server is unavailable.

Active Directory comes into play too with domain-based namespaces but management is still done from DFS Management.

Redundancy

Here's where it gets fun. To have everything fully redundant in the event that a server fails, every part of this needs to be mirrored. I'll discuss the various levels of redundancy here.

Namespace

The namespace server holds the metadata for the namespace. Be sure that this doesn't depend on a single server. The data stored here is often pretty small unless you have hundreds or thousands of folders in the namespace, so a dedicated server isn't necessarily required for this role as long as the namespace server can always respond quickly to any queries. The namespace servers can be the same servers as your content if you want.

To create a mirrored copy of the namespace, in the DFS Management tool, right-click on the Namespace and click on "Add Namespace Computer". Here you can point to an existing share on a different server or create a new share.

Folder Target

DFS masks which server is used for the folder target. To fully use DFS in this situation, you will need to point to multiple folder targets. In my situation, I want to have one server always used as long as it's available. I don't want to hit a random server because there could be data integrity issues. DFS replication is good, but it doesn't handle data locking or data write-through. This means that there could be a delay from when something is written on disk until it has replicated to all other servers. For that reason, I only want to fail over when absolutely necessary.

To achieve this there are a few things that are necessary.

  • The failback hotfix mentioned above needs to be installed.
  • All webfarm nodes need to be running Windows Server 2003 SP1 or later
  • The caching duration for the folders need to be changed. The default is 1800 seconds (30 minutes) which is too long for our situation. That means that less requests are made to the namespace folder, but it also means that the failback could take up to 30 minutes after the primary server is back online. You can update this by right-clicking on the folder in "DFS Management", going to properties and then the Referrals tab. Make sure to do this on each new folder. You can also change the cache duration on the namespace, but the default is already 300 seconds (5 minutes).
  • In the Referrals tab of the namespace properties, check the "Clients fail back to preferred targets" checkbox.
  • In the Referrals tab of the folder properties, check the "Clients fail back to preferred targets" checkbox.
  • On the properties of the primary folder target, in the Advanced tab, enable "Override referral ordering" and select "First among all targets"
  • On the properties of the backup folder targets, in the Advanced tab, enable "Override referral ordering" and select "Last among all targets"

Now you have a primary/backup server configuration that will always use the primary server as long as it is available.

Active Directory

The Active Directory part of things is done automatically and apart from the steps mentioned already, doesn't need any extra configuration. Just be sure to have redundant domain controllers in your Active Directory environment.

Links and Paths

There is a growing list of links and paths that can be used to testing purposes. Let me summarize them here assuming that the folder is called Site1 and the Folder Targets are also given the same name.

Using the DFS path directly: (DFS level)
   \\domain\webfarm\Site1

Accessing directly using the first namespace server: (namespace level)
   \\namespaceserver1\webfarm\Site1

Accessing directly using the second namespace server: (namespace level)
   \\namespaceserver2\webfarm\Site1

Accessing content directly on primary server without using DFS: (folder target level)
   \\contentserver1\Site1

Accessing content directly on second server without using DFS: (folder target level)
   \\contentserver2\Site1

Notice that it’s the DFS path (\\domain\webfarm\Site1) which will be used on the web servers and for most usages. It will always be the same, regardless of the namespace or target folder changes over time. The other paths are for testing and troubleshooting and could change over time.

Content Replication

With R2, DFS replication uses what is called Remote Differential Compression (RDC) which will only update changes to files and won't send the entire file across the wire. This is especially handy when replicating across a wide area network, but it's also good for this situation.

If you set up two or more folder targets using DFS Management, the wizard should have asked you if you want to set up replication, but if you did things in a different order, you can set it up manually after the fact. This can be done using the DFS Management tool as well.

Changes to the servers aren't immediate so DFS doesn't work well for transactional type data where both servers need to be 100% in sync within a couple seconds of each other. But for a website related situation that is mostly read intensive, DFS works great.

You have a few options but in our situation we'll use the Full mesh which means that any server will write to any other server. This means that in a failure situation, the content changes made on the backup server will push back to the primary server when it is online again.

How Good Is It?

DFS failovers are pretty impressive. If the primary content server becomes unavailable, DFS will fail over to the backup content server in a small number of seconds. In this webfarm situation, almost every time that the primary server fails, the HTTP protocol will retry for a few seconds until IIS is able to serve up a successful page.

This means that there is zero downtime if the primary content server fails. The only issue I ran into in testing is if the page load was 1/2 done when the primary server failed using master pages or web controls. It could potentially process 1/2 of an ASP.NET page and fail processing the rest. But this is pretty rare and I would say that the failover is as close to perfect as can be.

A failure of the namespace server is even smoother, resulting in no noticeable downtime or slowness.

File Change Notification in ASP.NET

There is one thing to keep in mind during a failover and failback situation. ASP.NET and IIS uses what is called File Change Notification (FCN) to let IIS know of any changes to files. For example, if you add a new .dll to your /bin folder, ASP.NET will recycle the AppDomain and reload and recompile some of the site. During a failure, although the switchover is smooth, it does take a few seconds, which is abrupt enough for IIS and ASP.NET to reestablish the File Change Notification handle using the different content server.

The issue comes with the failback. The failback is so smooth that the File Change Notification isn't updated back to the restored server. This means that if you make any changes to ASP.NET files on the restored content server, the changes aren't noticed by IIS and ASP.NET. Even deleting the entire /bin folder won't be recognized by ASP.NET if the site was visited and cached while running on the backup server. Static pages don't have this issue, but the caching in ASP.NET makes this a problem. At the time of this writing, I'm working with Microsoft Product Support Services (PSS) to try to find a good solution for this. To resolve it, simply recycle the app pool of the site(s) and it will start to function normally again. So, this isn't necessarily a show-stopper but it is something to keep in mind with the failover/failback.

Caching and DFS

DFS client computers (webfarm nodes in this case) cache the DFS information for the length of time that you specify, as I mentioned already. This shouldn't be too low or you will have too much traffic to the Namespace server, but it shouldn't be too high or changes to the namespace and failbacks to a restored server will take a long time to be noticed. It is up to your environment what you want to set this at, but in every situation, it's important to know that there is some caching that takes place.

Make sure to keep in mind that adding a new folder to your DFS namespace won't be noticed immediately. You can force the DFS client cache to be cleared by running dfsutil /PktFlush from the client server. dfsutil.exe is a tool that is available in the Windows Server 2003 /support/tools folder of the installation CD. I simply copy that file to C:\Windows\System32 and I can run dfsutil from the command prompt.

When setting up new sites, make sure to wait until the new site has been recognized by all of the webfarm nodes, or force a cache flush from all of the nodes before attempting to set up or update the site.

Backups of the Namespace

Make sure to make regular backups of your Namespace. This can be done easily using DFSUtil. Simply export to an .xml file on a regular basis and have your backup process back up that file. An example of the syntax needed is:

dfsutil /root:\\OW\webfarm /export:c:\NameSpaceBackups\DateToday\webfarmroot.xml

I did run into something when importing the namespace. I received the following error:

System error 1168 has occurred.
Element not found.

After some research and stumbling through it, I found out that I was using the domain name 'orcsweb.com' instead of NetBIOS name 'OW' in the UNC path, which the import didn't like. OW is used by DFS in this case. The export worked with either name, but the import only worked with \\OW\ which is what was in the exported XML file.

Links and Resources

Here are a number of resources that I've found helpful:

Microsoft DFS Landing page
http://www.microsoft.com/windowsserver2003/technologies/storage/dfs/default.mspx

DFS hotfixes, post R2
http://www.microsoft.com/windowsserver2003/technologies/storage/dfs/hotfixes.mspx

DFSUtil Examples:
http://technet2.microsoft.com/WindowsServer/en/Library/28be5bc5-694d-49ea-981e-34bdadd1a9311033.mspx?mfr=true

Whitepaper on designing Distributed File Systems
http://technet2.microsoft.com/WindowsServer/en/Library/1aa249c0-40f3-4974-b67f-e650b602415e1033.mspx?mfr=true

There is a lot to consider with DFS and I've only scratched the surface, but I hope that this has been helpful to cover a few common configuration settings that are required for configuring DFS on Windows Server 2003 R2 in a webfarm situation.

Posted: Jun 07 2006, 07:59 AM by OWScott | with 2 comment(s)
Filed under: ,
Enabling Windows Vista Aero Glass

I've been running Vista Beta 2 (Build 5384) for a few days now but the Aero Glass wasn't enabled for me.  The Aero Glass gives Windows-Key Tab 3D switching between windows, the hover over the start menu items shows a cool live screenshot of the window, and the regular Alt-Tab is enhanced.  There are other 3D and enhanced visual affects as well.

 

It worked when running one of the earlier Vista Beta builds and my hardware is new so I figured I shouldn't have a problem figuring out how to turn it on.  Well, it wasn't as straight forward for me so I figured I would post information that I found while trying to enable it on my computer.

 

The performance rating score that your computer appears to be the determining factor on whether or not Aero Glass is enabled.  If you can get your score high enough, then it is supposed to turn on by itself

http://blogs.technet.com/steve_lamb/archive/2006/05/26/430479.aspx

 

My computer was only rated at a 2 because of some boot up problems.  I viewed the information provided and the issue for me was that the Vista Search features took a while to load once which dropped by rating.  But even when I cleared the Event Log info, rebooted and refresh the performance score, my rating was always 2.  After some time I figured I would move on and worry about this another day.

 

You can override this check with the following registry key change: http://www.neowin.net/forum/lofiversion/index.php/t385504.html.  This can be applied to HKEY_LOCAL_MACHINE or HKEY_LOCAL_USER depending if you want it computer wide or user wide.  This seems to be the fix that helps the most people.

 

But even this didn't work for me, so I kept looking.  This blog of Julie Lerman's mentioned to get the vendor's latest drivers, so I went ahead a tried that: http://blog.ziffdavis.com/devlife/archive/2006/04/09/40828.aspx

 

No go.  Additionally there is another skin that can be downloaded that apparently achieves the same thing but works with lower end video cards: http://www.neowin.net/forum/lofiversion/index.php/t296493.html.  Some people seemed to have good success with this.  I didn't try this because I was determined to get the native Vista glass skin working, but I almost reverted to this option, and it seems to be a good option for many people.

 

Here's a potential fix that requires running WinSAT Aurora: http://chris123nt.com/2006/03/25/regain-glass-in-5342.  Again, it didn't work after this fix, but it may be what is required for some.

 

None of these worked for me, but after some poking around I finally figured out what the problem was.  I had to change the "Appearance Settings" from "Windows Vista Basic" to "Windows Vista Aero".  Very possibly I changed this to the basic appearance while poking around a few days ago, so I'll take the blame for it being wrong.  You can get to this by Right-clicking on the desktop -> Personalize -> Visual Appearance.  Note though that this screen is different, depending on if you have Aero working or not.  If Aero Glass is working, you'll get a screen which lets you change your color scheme.  Otherwise you'll get the classic appearance properties.  In here, I changed the color scheme to Windows Vista Aero and everything worked finally!  (if you get to the "Change your color scheme" screen instead of the Classic screen, you can click "Open classic appearance properties" at the bottom to get back the classic screen, but if you get the enhanced screen, you probably have Aero Glass working)

 

So, in my case I'm not sure which step or steps would have solved the issue since I didn't have the glass turned on in the first place. But hopefully one of these steps will be useful to you to get the Aero Glass working on your Vista install.

Posted: Jun 05 2006, 07:21 AM by OWScott | with 31 comment(s) |
Filed under:
More Posts