Exchange 2010 Database AvailabilityGroup’s and You
If your company is looking to implement a new installation of Exchange 2010 one of the more important aspects of your design will be the backend servers where your mailbox databases will reside. These backend servers will be clustered into what is known as a Database Availability Group, or DAG. You may be asking yourself “What is a DAG, and why should I care?”. Essentially a Database Availability Group is the core component of any High Availability design, and unfortunately for you, In eed to explain the basic process of DAG creation and implementation.
Once you have a base Exchange 2010 install you will need to create a DAG, add Mailbox servers to the cluster, and then replicate mailbox databases between the DAG members (more on this later). This process works with either the New Database Availability Group wizard in the Exchange Management Console, or by running the New-DatabaseAvailabilityGroup cmdlet in the Exchange Management Shell. When creating a DAG you will be prompted for various pieces of information including: a name for the DAG, optional witness server, and one or more IP addresses that are assigned to the DAG.
When the first server is added to the DAG, a cluster is formed for use by the Database Availability Group. DAGs make limited use of Windows Failover Clustering technology, namely the cluster heartbeat, cluster networks, and the cluster database. Any subsequent server that is added to the DAG is joined to the underlying cluster (the cluster’s quorum model is automatically adjusted by the system, as needed), and the server is added to the DAG object in Active Directory. Because DAGs rely on Windows Failover Clustering, they can only be created on Exchange 2010 Mailbox servers that are running Windows Server 2008 Enterprise Edition or Windows Server 2008 R2 Enterprise Edition.
A recap of the sequence that takes place when the first mailbox server is added to the DAG:
- The WindowsFailover Clustering component is installed.
- A failover cluster is created using the name of the DAG.
- A cluster network object (CNO) is created in the default computers container.
- The name and IP address of the DAG is registered as a Host (A) record in DNS.
- The server is added to the DAG object in Active Directory.
- The cluster database is updated with information on the databases that are mounted on the added server.
When the second, and any additional mailbox servers are added to the DAG, the following occurs:
- The server is joined to Windows Failover Cluster for the DAG.
- The quorum model is automatically adjusted.
- The witness directory and share are automatically created by Exchange when needed.
- The server is added to the DAG object in Active Directory.
- The cluster database is updated with info on mounted databases
Exchange 2010 Clusters use Node Majority Clustering. This means that half of your quorum votes (server votes and/or one file share witness) need to be up and running at all times. The proper formula for this is (n / 2) + 1 where n is the number of DAG nodes within the Database Availability Group. With DAGs, if you have an odd number of nodes in the same DAG (Cluster), you have an odd number of votes so you aren’t required to have a witness server. If you have an even number of nodes, you will have to have a file share witness in case half of your nodes go down, at which time the witness server will act as that extra +1 number.
When Exchange 2010 utilizes an even number of Nodes, it utilizes Node Majority with File Share Witness. If you have dedicated HUB and/or HUB/CAS Servers, you can place the File Share Witness on those Servers. However, the File Share Witness cannot be placed on any server hosting the Mailbox Server Role.
A File Share Witness Server does not actually hold a copy of the Quorum, but can devolve a vote (arbitrate) to a server which is online within the DAG configuration giving that DAG server two votes. The File Share Witness server keeps track of which of the DAG nodes has the most up to date copy of the Quorum database and will pass its vote to that server and provide the tie-breaker.
If you lose three of the DAG Servers or two plus the File Share Witness then a “Split Brain” scenario will occur. This is where the cluster cannot identify the most up to date copy of the cluster configuration, or even which server was running the relevant resources. This results in the whole DAG infrastructure going offline until an administrator can intervene to rectify the situation.
Have your eyes glazed over yet? Essentially what the preceding wall of text is trying to tell you is that you will need to keep a minimum number of mailbox servers, or mailbox servers and a witness server, online and functional or else you will lose the DAG cluster and will be in failover/DR mode. How many servers is a function of the formula mentioned earlier and is dependent upon your overall needs.
For example, if you have a DAG cluster composed of four DAG nodes and have the File Share Witness sitting on one of your CAS/Hub servers then theoretically you would be able to suffer the loss of two of the DAG nodes and still maintain quorum. Alternatively, you could also lose one DAG node and the FSW and also remain functional whilst troubleshooting the lost servers. This also comes into play when planning scheduled Exchange server maintenance outage windows, since you will only take down the number of servers that will be tolerated within the parameters of your cluster design.
Additionally, to a certain extent, how effective your DAG and HA are depends upon how you choose to replicate database copies amongst your mailbox servers. Each server can host multiple passive replica’s of the other DAG members. For example, with a four node cluster you could host copies of node two and node three on node one. Node two would host copies of node three and node four. Node three would host copies of node four and node one, and node four would host copies of node one and node two.
Since you have multiple passive copies of each of your four active databases spread across the entire cluster this adds resiliency to the design. When you power off a server for maintenance the fact that the node is now off-line is recognized by the cluster (and FSW) and one of the passive copies of that database is made active based on the quorum vote. Essentially this is invisible and your user base should not even notice. This is a particularly nice feature and will come in handy in the event of hardware failure or regularly scheduled maintenance window for the servers.
So there you have it, Database Availability Group 101. I congratulate you if you managed to read this entire blog in one sitting, to be honest I almost put myself to sleep typing it, but it is good information to wrap your brain around prior to implementing a DAG cluster in your own environment. A wisely designed and healthy DAG cluster will make the life of any administrator less stressful and that is a goal any of us can appreciate.