SQL Server AlwaysOn What to avoid and how to optimize, deploy and operate. 11. ročník největší odborné IT konference v ČR! Michal Tinthofer Michal.Tinthofer@Woodler.eu Praha 2013
Overview Introduction to AlwaysOn & Availability Groups How to set up AlwaysOn Managing and tuning AlwaysOn Monitoring Availability Groups Common issues Questions & Answers
Introduction to AlwaysOn & Availability Groups
SQL Server Always On Solutions AlwaysOn Availability Groups AlwaysOn = (SQL Server Failover Cluster Instances + Availability Groups) Availability Groups Database Mirroring High Availability (Local HA): Availability within a data center Disaster Recovery (DR): Availability across data centers
SQL Server Always On Solutions Availability Groups Non-Shared Storage Solution (Group of) Database level HA (Group of) Database level DR DR replica can be Active Secondary Databases must be in FULL recovery model Failover Cluster Instance (for local HA) + Availability groups (for DR) Combined Shared and Non-Shared Storage Instance level HA (Group of) Database level DR DR replica can be Active Secondary Databases must be in FULL recovery model
The logical topology of a representative Availability Groups solution
Windows Server Failover Clustering (WSFC) Cluster Network Subnet A Network Subnet B Node A1 Node A2 Node A3 Node B1 Node B2 WSFC Configuration WSFC Configuration WSFC Configuration WSFC Configuration WSFC Configuration Availability Group Virtual Network Name SQL Server Failover Cluster Instance 1 SQL Server Instance 2 SQL Server Instance 3 SQL Server Instance 4 Instance Network Name Instance Network Name Instance Network Name Instance Network Name Availability Group Listener AlwaysOn Availability Group Primary Replica Secondary Replica Secondary Replica Secondary Replica Shared Storage Storage Storage Storage WSFC Quorum Witness Remote File Share (Optional) The logical topology of a representative AlwaysOn solution
How to Set Up AlwaysOn
Pre-installation Planning: 1. Choose number and location of replicas 2. Choose Quorum model (based on number of replicas) 3. Design Availability Groups 4. Plan DR strategy with 2k12 SQL servers only 5. If you are moving your environment from older sql environment further checks are necessary. 1.1 Synchronous or asynchronous replication? Up to three synchronous Total of four replicas + primary 1.2 Where should be Automatic failover? Two Automatic failover partners (primary and secondary) 2.1 Decision is made by number of nodes and votes used to prevent cluster split brain Odd: Node majority Even: Node and file share majority How many nodes need to be up for the cluster to stay online, and which nodes should vote. 3.1. How many databases do you want in each group? 3.2. Do you want to move groups around for load balancing? 3.3. Do you need to separate reporting databases from the OLTP databases Report db can be read-only replica of production OLTP? 3.4. If just one database on an instance fails, what do you want to happen? 3.5. How many AG you need based on db priority and function Auto Failover and synch replica or manual failover and asynch replica. 4. If any other servers require restores from your new 2012 instance, they should get 2012 first. (Think development, disaster recovery, or reporting servers.) 5. Review non-database stuff installed on old servers. Plan for SSIS, SSAS, SSRS, DTS packages, logins, Agent jobs, etc.
Windows Configuration Create a single instance account for all SQL Servers You can use different accounts per node if NO Kerberos auth. will ever be used for listener. Use same drive letters for across all nodes (No C:\ drive please ) Use static IP addresses for all servers & subnets. If SQL Failover Cluster instance (FCI) will be used: Reserve IP also for SQL instance & Cluster management Configure DTC if clustered Achieve network redundancy between nodes before SQL installation. No separate heartbeat network necessary Use more NICs with separate IPs or Team them. Where backups will be done? Choose preferred & secondary server What if failover to DR site occurs? Plan for both sites (DR & HA) Use windows update to get servers up to date. Don t allow auto download and install updates!
Installation Single domain should be used! Install Windows prerequisite hotfixes for Availability Groups http://msdn.microsoft.com/en-us/library/ff878487.aspx#winhotfixes Install the hotfix to allow nodes with 0 votes - http://support.microsoft.com/kb/249403 This is important to do on all nodes Open Windows firewall ports 1433,1434 and 5022 on all nodes. Enable Instant File Initialization..NET Framework 3.5.1 feature and the Failover Clustering feature should be installed
Cluster Installation step by step Install Failover Cluster Feature
Cluster Installation step by step Create a primary cluster via failover cluster manager
Cluster Installation step by step Pass validation tests SCSI SAN must support SCSI-3 Persistent Reservation Especially iscsi
Cluster Installation step by step Specify cluster name for MSCS service Not instance name
Cluster Installation step by step Windows Cluster created With two nodes for Primary site Sql1 Sql2
AlwaysOn Current progress
Cluster Installation settings Servers will need to exchange backup files and other stuff so we need a fileshare for this. Create folder on shared server Will be used for new quorum settings R/W access Go to cluster management and change quorum mode
Cluster Installation settings Quorum is managed by the WSFC, irrespective of the number of SQL Server instances, number of nodes, number of availability groups Important goal: Design to ensure Unavailability of the DR site (or the node at DR site), or loss of network connectivity between sites should not impact the quorum of the WSFC Two steps: Node votes: First decide which nodes should have a vote Quorum Model: Then choose the appropriate quorum model
Cluster Installation settings Node and Fileshare Majority Use this quorum model with a protected file share witness. The Fileshare Witness always has 1 vote. SQL Server Primary Disaster Recovery Primary Data Center Data Center Windows Server Failover Cluster (single WSFC crossing two data centers) Synchronous SQL Server Secondary Availability Group Asynchronous SQL Server Secondary File Share
Cluster Installation settings Node Majority Add an additional voting node to the WSFC in the primary data center, and then use the this quorum model. SQL Server Primary Disaster Recovery Primary Data Center Data Center Windows Server Failover Cluster (single WSFC crossing two data centers) Synchronous SQL Server Secondary Availability Group Asynchronous SQL Server Secondary Additional Server for Node Majority Quorum Model
Cluster Installation settings Pick Node and file share majority because we will have even number of nodes
Cluster Installation settings Confirm new quorum settings Important design goal is to ensure that: Unavailability of the DR site, or loss of network connectivity between sites should not impact the quorum of the WSFC
AlwaysOn Current progress
Cluster Installation settings Disallow SAN network for cluster communication
Cluster Installation settings Add shared disk to cluster Suooprted SCSI iscsi Fibre Channel
SQL Clustered Instance setup Prefer default instance over named Use single AD service account for all instances You will be able to use Kerberos authentication in future Also use same collation on all instances Use same drive letters on all nodes Apply latest Service Pack for SQL Server 2012 Enable AlwaysOn HADR via Pwshell or ConfMgr. Prepare SAN storage Confirm network routing between subnets Configure SQL Instance as you do usually after new installation Memory settings, tempdb, model db, enable TCP/IP protocol, static port, default compression, maxdop, dbmail, SQL Agent alerts, data collecror, scheduled backups for system dbs, index fill factor, etc..
SQL Clustered Instance setup We are going to use clustered SQL instance
SQL Clustered Instance setup Pass rule check first
SQL Clustered Instance setup NW binding rule fix
SQL Clustered Instance setup Pick features and name NW name Hostname of instance ID Instance Depends on Type
SQL Clustered Instance setup Pick name of cluster resource group There will be stored resources for PROD cluster Then choose available shared storage And IP address on which PROD instance will listen
SQL Clustered Instance setup Add accounts to sysadmin role Optionally you can make AD group which will hold all service accounts of AG partners and sysadmins
SQL Clustered Instance setup Set Service account for new instance Finish installation and then install latest service pack
SQL Clustered Instance setup From SQL installation add node to newly created clustered SQL instance (FCI)
SQL Clustered Instance setup Choose existing FCI Then confirm network interface used for communication Fill password for service account
SQL Clustered Instance setup After successful installation you will be asked for SP Patch the server to last SP
AlwaysOn Current progress
Enable AlwaysON on PROD FCI Via Configuration Manager Or Powershell SQL5 machine is missing in cluster
DR Cluster Installation Add DR nodes to existing windows cluster
DR Cluster Installation Some resources cannot be used on those DR nodes Such as: Shared drives PROD LAN Etc..
DR Cluster Installation Remove Cluster Votes for offsite nodes. Also remove nodes from possible & preferred owners on PROD FCI resources We need to separate two SQL FCIs
AlwaysOn Current progress
DR Cluster Installation Install second FCI This will be our DR site Pick features and name NW name Hostname of instance ID Instance Depends on Type Pick name of cluster resource group There will be stored resources for DR cluster
DR Cluster Installation Then choose available shared storage And IP address on which DR instance will listen Set Service account for new instance and members of sysadmin role
DR Cluster Installation TEMPDB on Local Disk Enables use of local storage for TEMPDB Can use solid state storage to improve performance of TEMPDB-heavy workloads Saves money on storage replication licensing Reduces cross-data center storage replication traffic Tempdb folders should be on same path on both nodes
DR Cluster Installation Same as on PROD FCI add second node on DR FCI Don t forget on SP!!
DR Cluster Configuration Enable AlwaysOn on DR FCI Remove PROD nodes from possible & preferred owners on DR FCI resources
FCI Configuration If named instance used, set static port on All nodes TCP/IP Allows you to configure firewall effectively
FCI Configuration Again ensure that correct possible & preferred owners are set on all resources in cluster application (services, storage, network, etc..)
AlwaysOn Current progress
AlwaysOn (Availability groups) setup Databases should be: In Full Recovery mode Full backup must exist FCI+AG using manual asynchronous mode (by default)
Special Case: Automatic Failover for DR Use of 3rd Data Center and synchronous mode 3 rd Data Center Primary Data Center SQL Server Primary File Share Windows Server Failover Cluster Disaster Recovery Data Center SQL Server Secondary Synchronous Availability Group
AlwaysOn (Availability groups) setup Ensure correct endpoints are chosen Pick both FCI (PROD & DR)
AlwaysOn (Availability groups) setup Build your backup strategy for DR Create AG Listener to provide centralized Client management. Using both subnets
AlwaysOn (Availability groups) setup Select Initial Synchronization method Don t forget on AG prerequisites In this case folder structure should be same
AlwaysOn (Availability groups) setup Fixing AG prerequisites In this case folder structure should be same Create same folders on both FCI s to hold data files
AlwaysOn (Availability groups) setup Validation scripting and process Logins Endpoints Hard_endpoint Start XE session AlwaysOn_Health Create AG Add Listener Join Members Backup/Restore dbs Wait 5 min to start communicating
AlwaysOn Current progress
Result: FCI +AG Three separate FCIs Production Holds PROD SQL service Disaster Recovery Holds DR SQL service AlwaysOn Holds AG Listener Two separate LANs Nodes are evenly distributed
How to failover AlwaysOn Disaster = Primary site is down Manual Process involved to bring database service online on the DR site Force Quorum on the secondary in the DR site Start-ClusterNode Name "DRNODE1" FixQuorum If still running Stop-ClusterNode Name "DRNODE1" Execute FORCE SERVICE ALLOW DATA LOSS ALTER AVAILABILITY GROUP [AlwaysOnAG] FORCE_FAILOVER_ALLOW_DATA_LOSS; Adjust quorum model and/or node votes (Get-ClusterNode "DRNode1").NodeWeight=1 (Get-ClusterNode "DRNode2").NodeWeight=1 (Get-ClusterNode "PrimaryNode1").NodeWeight=0 (Get-ClusterNode "PrimaryNode2").NodeWeight=0 Reporting: Get-ClusterNode fl NodeName, NodeWeight
How to failback AlwaysOn Adjust quorum model node votes to 1 for primary site nodes Resume synchronization for all paused AlwaysOn databases Optionally switch to synchronous mode To ensure transaction safety Failback to primary site Switch back to asynchronous mode Adjust quorum model node votes to 0 for dr site nodes
Demo How to failover via GUI or TSQL
Tuning SQL Server Failover Cluster Instance DNS Settings
RegisterAllProvidersIP setting for AG listener Reducing client recovery latency after failover Connection strings that set MultiSubnetFailover to true/yes AlwaysOn Availability Groups sets the RegisterAllProvidersIP property to 1 in order to reduce re-connection time after a failover Clients will connect to all IP addresses simultaneously Available providers.net Framework 3.5 SP1 + connectivity patch SQL Native Client 11.0 ODBC Microsoft JDBC driver 4.0 for SQL Server Legacy strings change RegisterAllProvidersIP to 0 The active IP address (instead of all) is listed in the Client Access Point in the WSFC cluster, reducing latency for legacy clients. Change HostRecordTTL to 300 Clients will check AG listener record on DNS every 5 minutes
RegisterAllProvidersIP Setting for AG listener
Testing MultiSubnetFailover Same process as ReadOnlyRouting in my previous session
Monitoring
FCI monitoring On SQL 2k8 ResDLL query @@Servername If two times no SQL response = resource down, then failover On SQL 2k12 ResDLL query SP_SERVER_DIAGNOSTICS Continuous send of data from SQL instance Is more precise Better configuration of failover policy Diagnostics logs are stored on server for later analysis
Flexible Failover Policy User sets new Cluster properties HealthCheckTimeout and FailureConditionLevel FailureConditionLevel (0 to 5) 5 Failover or restart on any qualified failure 4 Failover or restart on moderate SQL Server errors 3 Failover or restart on critical SQL Server errors 2 Failover or restart on SQL Server unresponsive Diagnostics Exec sp_server_diagnostics 1 Failover or restart on SQL Server down 0 No Automatic Failover or restart IsAlive/ LooksAlive result based on diagnostics and FailureConditionLevel WSFC Service IsAlive /LooksAlive WSFC asks Res DLL if SQL FCI alive
Flexible Failover Policy Configuration can be viewed and reconfigured from SQL server or Cluster resource HealthCheckTimeout Default 60sec Min 15sec Speed of Responses FailureConditionLevel Default 3 Failover or restart on critical SQL Server errors Should be enough User Configurable Diagnostics are always captured Instance root LOG\%SQLDIAG% SP_server_diagnostics [Level]
Demo Managing SQL server diagnostics
Q&A Thank You!! Michal.Tinthofer@Woodler.eu www.woodler.eu