The importance of clock alignment between AppFabric servers

In our latest release, we are taking advantage of the new AppFabric Caching Service to host our session state.  AppFabric provides fault-taulerance for session data within our web server farm.  However, the AppFabric cache servers were crashing every hour or so in our QA environment with the following exception:

AppFabric Caching service crashed.{Lease with external store expired:
Microsoft.Fabric.Federation.ExternalRingStateStoreException: Lease already expired
at Microsoft.Fabric.Data.ExternalStoreAuthority.UpdateNode(NodeInfo nodeInfo, TimeSpan timeout)
at Microsoft.Fabric.Federation.SiteNode.PerformExternalRingStateStoreOperations(Boolean& canFormRing, Boolean isInsert, Boolean isJoining)}

The environment is configured with 2 Windows 2008 R2 Data Center 64-bit VMs (4GB ram each) running the AppFabric Caching service.  The caching configuration
is stored in a SQL Server 2008 R2 database. 

The cache was created with the following PowerShell command:

          New-Cache -CacheName SessionStateCache-1 -Secondaries 1 -TimeToLive 20

After much head scratching and discussion, I finally hit upon the answer:  the time between the servers was slightly off.  The cache servers had a 2 minute time difference between that of the database server.  Apparently this difference is large enough to cause serious problems.  Our cache servers were pulling their time from Active Directory, while the database server had been misconfigured and was pulling its time from the VM host.  This was easily fixed by changing the setting within the VMWare tools on the database server.  Once this change was made, the crashing stopped & the cache has been running solid.

This is just another example of how critical the alignment between clocks has become.  I had a similar (but different) problem with my son's iPod touch.  It stopped playing videos on YouTube & Netflix.  Connectivity was fine, but somehow his clock was set back to the beginning of time.  Once I set the clock correctly, all was well.  You can have similar problems with SSL if your system clock isn't set correctly.

So, if you're having unexplainable problems, check your system clock.

No Comments