A pseudo-listener for AlwaysOn Availability Groups for SQL Server virtual machines running in Azure

I am involved in a project that is implementing SharePoint 2013 on virtual machines hosted in Azure. The back end data tier consists of two Azure VMs running SQL Server 2012, with the SharePoint databases contained in an AlwaysOn Availability Group. I used this "Tutorial: AlwaysOn Availability Groups in Windows Azure (GUI)" to help me implement this setup.

Because Azure DHCP will not assign multiple unique IP addresses to the same VM, having an AG Listener in Azure is not currently supported.  I wanted to figure out another mechanism to support a "pseudo listener" of some sort.

First, I created a CNAME (alias) record in the DNS zone with a short TTL (time to live) of 5 minutes (I may yet make this even shorter). The record represents a logical name (let's say the alias is SPSQL) of the server to connect to for the databases in the availability group (AG). When Server1 was hosting the primary replica of the AG, I would set the CNAME of SPSQL to be SERVER1. When the AG failed over to Server1, I wanted to set the CNAME to SERVER2. Seemed simple enough.

(It's important to point out that the connection strings for my SharePoint services should use the CNAME alias, and not the actual server name. This whole thing falls apart otherwise.)

To accomplish this, I created identical SQL Agent Jobs on Server1 and Server2, with two steps:

1. Step 1: Determine if this server is hosting the primary replica.

This is a TSQL step using this script:

declare @agName sysname = 'AGTest'
set nocount on

declare @primaryReplica sysname

select @primaryReplica = agState.primary_replica
from sys.dm_hadr_availability_group_states agState
  
join sys.availability_groups ag on agstate.group_id = ag.group_id
  
where ag.name = @AGname

if not exists(
  
select *
  
from sys.dm_hadr_availability_group_states agState
  
join sys.availability_groups ag on agstate.group_id = ag.group_id
  
where @@Servername = agstate.primary_replica
  
and ag.name = @AGname)

begin

   raiserror ('Primary replica of %s is not hosted on %s, it is hosted on %s',17,1,@Agname, @@Servername, @primaryReplica)

 

end

This script determines if the primary replica value of the AG group is the same as the server name, which means that our server is hosting the current AG (you should update the value of the @AgName variable to the name of your AG). If this is true, I want the DNS alias to point to this server. If the current server is not hosting the primary replica, then the script raises an error. Also, if the script can't be executed because it cannot connect to the server, that also will generate an error.

For the job step settings, I set the On Failure option to "Quit the job reporting success". The next step in the job will set the DNS alias to this server name, and I only want to do that if I know that it is the current primary replica, otherwise I don't want to do anything. I also include the step output in the job history so I can see the error message.

Job Step 2: Update the CNAME entry in DNS with this server's name.

I used a PowerShell script to accomplish this:

$cname = "SPSQL.contoso.com"
$query = "Select * from MicrosoftDNS_CNAMEType"
$dns1 = "dc01.contoso.com"
$dns2 = "dc02.contoso.com"
if ((Test-Connection -ComputerName $dns1 -Count 1 -Quiet) -eq $true)
{
    $dnsServer = $dns1
}
elseif ((Test-Connection -ComputerName $dns2 -Count 1 -Quiet) -eq $true)
{
   $dnsServer = $dns2
}
else
{
  $msg = "Unable to connect to DNS servers: " + $dns1 + ", " + $dns2
   Throw $msg
}
$record = Get-WmiObject -Namespace "root\microsoftdns" -Query $query -ComputerName $dnsServer  | ? { $_.Ownername -match $cname }
$thisServer = [System.Net.Dns]::GetHostEntry("LocalHost").HostName + "."
$currentServer = $record.RecordData
if ($currentServer -eq $thisServer )
{
    $cname + " CNAME is up to date: " + $currentServer
}
else
{
    $cname + " CNAME is being updated to " + $thisServer + ". It was " + $currentServer
    $record.RecordData = $thisServer
    $record.put()
}

This script does a few things:

  • finds a responsive domain controller (Test-Connection does a ping and returns a Boolean value if you specify the -Quiet parameter)
  • makes a WMI call to the domain controller to get the current CNAME record value (Get-WmiObject)
  • gets the FQDN of this server (GetHostEntry)
  • checks if the CNAME record is correct and updates it if necessary

(You should update the values of the variables $cname, $dns1 and $dns2 for your environment.)

Since my domain controllers are also hosted in Azure VMs, either one of them could be down at any point in time, so I need to find a DC that is responsive before attempting the DNS call. The other little thing here is that the CNAME record contains the FQDN of a machine, plus it ends with a period. So the comparison of the CNAME record has to take the trailing period into account.

When I tested this step, I was getting ACCESS DENIED responses from PowerShell for the Get-WmiObject cmdlet that does a remote lookup on the DC. This occurred because the SQL Agent service account was not a member of the Domain Admins group, so I decided to create a SQL Credential to store the credentials for a domain administrator account and use it as a PowerShell proxy (rather than give the service account Domain Admins membership).

In SQL Management Studio, right click on the Credentials node (under the server's Security node), and choose New Credential...

Then, under SQL Agent-->Proxies, right click on the PowerShell node and choose New Proxy...

Finally, in the job step properties for the PowerShell step, select the new proxy in the Run As drop down.

I created this two step Job on both nodes of the Availability Group, but if you had more than two nodes, just create the same job on all the servers. I set the schedule for the job to execute every minute.

When the server that is hosting the primary replica is running the job, the job history looks like this:

The job history on the secondary server looks like this: 

When a failover occurs, the SQL Agent job on the new primary replica will detect that the CNAME needs to be updated within a minute. Based on the TTL of the CNAME (which I said at the beginning was 5 minutes), the SharePoint servers will get the new alias within five minutes and should be able to reconnect. I may want to shorten up the TTL to reduce the time it takes for the client connections to use the new alias.

Using a DNS CNAME and a SQL Agent Job on all servers hosting AG replicas, I was able to create a pseudo-listener to automatically change the name of the server that was hosting the primary replica, for a scenario where I cannot use a regular AG listener (in this case, because the servers are all hosted in Azure).

 

 

 

 

No Comments