Thursday, December 4, 2014

SQL Server Hybrid High Availability and Disaster Recovery

Problem
As the cloud becomes more and more widely used and the options for SQL Server continue to evolve, more SQL Server shops are looking at using cloud based services such as Azure and Amazon as a valid option for their high availability (HA) failover clustering and offsite disaster recovery (DR) solution. It is important to understand the levels of SQL protection needed in your organization and to match those levels with the right HA/DR option. Evaluate your options in terms of: ability to meet recovery time (RTO) and recovery point objectives (RPO); hardware and software cost; and ease of use.
Once you define your protection needs, you need to decide whether you need HA, DR, or both. In an HA environment, operation of your application moves from a primary server to a standby server in the event of a failure. The failover may be within the same data center or to a remote data center for disaster recovery protection. DR can also mean replicating applications and data to a remote site and restoring it if there is a problem on the primary server or even complete site loss.
Setting up a remote failover site can be confusing because SQL Server offers so many options, including: log shipping, replication, database mirroring, AlwaysOn Failover Clustering and AlwaysOn Availability Groups.
Solution
The following sections summarize the factors to consider and provide an overview of various options for HA and DR in cloud based SQL Server deployments.

Things to consider for SQL Server cloud based High Availability and Disaster Recovery

Recovery time objective or RTO

This is the amount of time your application can be unavailable before it starts to impact business operations. The shorter the downtime window, the more robust the solution needs to be. So any solution that depends on manual intervention, such as log shipping or database mirroring, will have a longer recovery time than an automatic solution. On the other end of the spectrum, high availability clustering can automatically complete a failover and restore operations quickly, allowing you to meet a recovery time objective of less than a minute.

Recovery point objective or RPO

This is the age of the data when service is restored and measure of the amount of data loss that can be tolerated in the event of a failure. Depending on the solution and the time required to restore operation there could be loss of data. For example, if you are using log shipping you could lose any transactions that have not yet been shipped when the failure occurs. Or if you are using clustering, the SQL Server services need to stop on one server and start on the other when the servers failover. Therefore SQL Server will need to redo or undo any transactions that did not complete when the failover occurred. The recovery point for clustering will therefore be much better than the recovery point possible with log shipping.

Cost

You need to consider the costs of additional SQL Server licensing needed for your cloud-based server. If you decide that AlwaysOn Availability Groups is the right solution, you will need to purchase the Enterprise edition of SQL Server, which is much more expensive than the Standard edition. If you decide to use SQL Server 2014 in-memory tables, you need to make sure your cloud based failover server has sufficient memory to support the failover. You will also need to consider the cost of the cloud based service you are implementing and, if necessary, the cost of additional bandwidth you may need to replicate data between your on-premise and cloud servers.

Complexity of Environment

The more databases you need to protect, the more complex things become and the more there is to setup and manage. If you use AlwaysOn Availability Groups, database mirroring, replication or log shipping, then each database that is replicated requires additional SQL Server processes to manage the data replication to the other server, adding overhead to SQL Server.  In addition, if you use any of these options, system databases will not be protected, so things like SQL logins and SQL Agent jobs need to be manually maintained in multiple places.  Setup and monitoring is done on a database by database level, so servers with many databases can be a challenge to manage.
With AlwaysOn Failover Clustering all databases in the instance are protected, but you will need to use shared storage for this solution which is not natively available between an on-premise and cloud server, but this can be achieved using a SANLess cluster.

Application Compatibility

When using technologies like AlwaysOn Availability Groups, you need to know whether they are compatible with the application. For example, applications that use distributed transactions are not compatible with AlwaysOn Availability Groups. A failover cluster does not have that limitation.

Comparison of cloud based SQL Server HA and DR solutions

In the next section, we will cover pros and cons of each solution.

Log Shipping

SQL Server log shipping takes periodic transaction log backups of your database, copies the transaction log backup to another server and then does a restore on the secondary server in a NO RECOVERY state, so additional transaction log backups can be restored until you need to actually failover and recover your database. Since this process uses backups of your database, all changes, both data and structure, are applied to the secondary server.  There are very limited options when setting this up, which makes this very easy to implement but it makes you rely on various tasks that run on both the primary and secondary severs.  As you can see in the diagram below log shipping is a one-way data feed from your primary server to your secondary server.  Failback is not an option.
sql server log shipping to the cloud
Pros
  • Easy to setup using either the built-in wizard in SSMS or using your own processes.
  • Allows you to delay the restore on the secondary server, so changes are not applied immediately.
Cons
  • The database has to be in FULL recovery. This shouldn't be an issue for critical databases, but it forces you to configure all databases to use this recovery model along with transaction log backups.
  • Has to be setup database-by-database, which can take time to initially configure as well as make changes.
  • Does not replicate system level data such as SQL logs or SQL jobs, so you need to be maintain them on both servers and keep them in synch when failover occurs.
  • Requires you to build a process to determine when SQL jobs should be active on one server and in standby mode on the secondary server.
  • Requires a manual failover steps to bring the secondary database online as well as to point your applications to the new server.
  • No easy way to failback to the primary server.
  • New databases are not automatically protected.
SQL Editions
  • Works with all editions of SQL Server at no extra cost (note: no built-in features for SQL Express).
Recommendation
  • This would be used as a DR solution and could be used on top of some of the other options below.

SQL Server Replication

SQL Server replication gives you the ability to select individual objects to replicate between servers.  You have the option to select some or all objects in the database.  SQL Server offers many options for replication with the ability to replicate data one way or both ways.  To setup replication, you create publications that contain one or more articles (database objects).  These publications are created on the publisher and changes are passed to a distributor which then passes them on to the subscriber(s).
sql server replication to the cloud
Pros
  • Gives you the ability to select specific objects to replicate, but unfortunately this does not give you full coverage unless you are replicating all required objects.
  • Allows you to distribute the work load over multiple servers.
Cons
  • Adds significant extra overhead to setup and manage replication. Not a recommended option if you need a true HA or DR solution.
  • Has to be setup database-by-database, which can take a lot of time to initially configure as well as make changes.
  • Does not replicate system level data such as SQL logs or SQL jobs, so you need to maintain them on both servers and keep them in synch when failover occurs.
  • Requires you to build a process to determine when SQL jobs should be active on one server and in standby mode on the secondary server.
  • May require a manual failover step to point your applications to the new server.
  • Data issues could stop replication from working, which defeats the purpose of having a HA / DR solution.
  • New databases are not automatically protected.
  • New objects are not automatically replicated.
SQL Editions
  • Works with all editions of SQL Server at no extra cost (note: there are some limitations using SQL Express).
Recommendation
  • This is a great solution if you need to setup a distributed workload, but there are too many potential issues that can occur when configuring and changing replication options to implement this as a HA / DR solution.

Database Mirroring

Database mirroring allows you to create a complete replica of a database on a secondary server. All changes that occur in the primary database are almost immediately replicated to the secondary server. With database mirroring either server can easily become the Principal (active) server. In order to have automatic failover capabilities you would also need to have a witness server as shown in the second diagram.  One downside to this option is that Microsoft announced this is a deprecated feature and may not be available in future SQL Server releases.
The below diagram depicts a two server configuration.  This option does not allow for automatic failover.
sql server database mirroring to the cloud
The below diagram depicts a three server configuration adding a witness server.  This option allows automatic failover.

sql server database mirroring to the cloud
Pros
  • Can provide automatic failover if a witness server is used.
  • All contents of the database are replicated.
  • First database takes some configuration to setup, but adding additional databases is fairly easy.
  • Can failback easily.
Cons
  • Failover is only supported in High Safety Mode (synchronous replication)
  • Microsoft announced that this feature will be deprecated in future releases.
  • Failover can occur for just one database, so this can cause issues for multi-database applications.
  • Has to be setup database by database, which can take a lot of time to initially configure as well as make changes.
  • Does not replicate system level data such as SQL logs or SQL jobs, so this will need to be maintained on both servers as well as kept in synch when failover occurs.
  • Requires you to build a process to determine when SQL jobs should be active on one server and in standby mode on the secondary server.
  • May require a manual failover solution to point your applications to the new server.
  • New databases are not automatically protected.
SQL Editions
  • Works with Standard (synchronous only), BI and Enterprise editions.
Recommendation
  • If you have a one database application, this is a simple solution to implement, but since this will be a deprecated feature it may make sense to use a different approach.

AlwaysOn Failover Clustering

AlwaysOn Failover Clustering feature gives you the ability to have multiple servers (nodes) participate in a cluster to allow a highly available configuration for your database server.  If one of the nodes has a failure, another node can take over with limited downtime.  This uses the built-in Windows Server Failover Clustering feature to handle the failover.
sql server alwayson failover cluster to the cloud
Recommendation
  • Since this requires shared storage, this is not an option for on-premise to cloud solution

AlwaysOn Availability Groups

This feature combines the functionality of database mirroring along with the quorum and client access point features of failover clustering.  With database mirroring, one database could failover to the secondary server, but other databases could remain active on the primary server.  AlwaysOn Availability Groups eliminates that problem and allows you to group databases together so if one database fails, all databases in the group will failover.

sql server alwayson availability groups to the cloud
Pros
  • Gives a more robust failover solution compared to database mirroring.
  • Easy to failover and failback.
  • Can failover a set of databases for multi-database applications.
  • Provides the ability to have read only replicas.
Cons
  • Requires the Enterprise edition of SQL 2012 or SQL 2014, which is much more expensive.
  • Does not replicate SQL logins or SQL jobs.
  • Each additional database or availability group will consume more worker threads, which will limit the total number of databases that can be replicated before CPU resources are exhausted.
  • Will need to coordinate when jobs are active on each node.
  • New databases are not automatically protected.
  • Does not work with distributed transactions.
SQL Editions
  • Works with Enterprise edition only.
Recommendation
  • This is good solution for on-premise to cloud HA / DR, but it requires the Enterprise edition of SQL Server which will greatly increase costs.

AlwaysOn Failover Clustering without shared storage

As mentioned, native Windows Server Failover Clustering requires shared storage, but by adding SIOS DataKeeper Cluster Edition software you can make local storage look and function like shared storage. This step gives you all of the advantages of AlwaysOn Failover Clustering without the configuration limitations of shared storage.  In the diagram below you can see that storage exists on both Node 1 and Node 2.  With SIOS DataKeeper the data is synchronized on the two nodes using block level replication. You can replicate all databases, both user and system, between your on-premise and cloud servers for disaster recovery and take full advantage of all the Windows failover clustering features.
The first diagram shows having just two nodes- one node in an on-premise physical server and the second node in the cloud.
sql server alwayson failover cluster without shared storage to the cloud
You can also create a more robust solution by adding nodes.  In this example you can have two nodes on-premise with local storage (you can use your choice of industry standard storage, including high performance SSD storage) and a third node in the cloud. Each node can also have its own storage, without the need for SAN based storage.
sql server alwayson failover cluster without shared storage to the cloud
Pros
  • Does not require shared storage.
  • Can use the robust features of Windows clustering.
  • Replicates the entire instance including SQL logins and SQL jobs.
  • New databases are automatically replicated.
  • Easy to failover and failback without application changes.
  • Eliminates the SAN as a single point of failure.
  • Works with distributed transactions.
  • Can also replicate non SQL Server data that is required by the application.
  • Data replication is much faster than other options.
  • Supports high performance SSD storage in physical server and virtual server environments
Cons
  • There is an additional cost for the software.
SQL Editions
  • Works with Standard, BI and Enterprise editions.
Recommendation
  • If you need to protect your entire database server and implement a full featured HA / DR solution this is the best option.

No comments:

Post a Comment