Mission-Critical Availability Detect failures reliably Able to stand multiple failures Unified solution Easy to configure, manage, and monitor Reuse existing investments SAN/DAS environments Allow using HA hardware resources Fast seamless failover
AlwaysOn SQL Server HA/DR Technologies Failover Cluster Instances (for servers) Availability Groups (for groups of databases) Pre-existent Server failover Useful in consolidation scenarios Shared storage (SAN / SMB) Depends on storage redundancy Failover takes minutes Server restart Multi-node instance Passive secondary nodes New Multi-database Failover DBs that app depends on Direct attached storage Log synchronization Failover takes seconds Secondary replicas are online Multiple Secondary Replicas Active Secondary Replicas
Failover Cluster Instances Enhancements in SQL Server 2012 Multi-Site Clusters Clusters across subnets without stretch vlan TempDB on local disk Improved performance, better SAN utilization Indirect Checkpoints More predictable DB recovery (failover) times Flexible Failover Policy Sensitivity to failures for automatic failover
Availability Groups Introduced in SQL Server 2012 Multi-database Failover Multiple secondaries (4) Sync (max 2) / Async Compression & Encryption Manual/Automatic Failover Flexible Failover Policy Automatic Page Repair Integrated Seamless App Connectivity Configuration Wizard Monitoring Dashboard Diagnostics infrastructure System Center integration Full cross-feature support Contained Databases, FileStream, FileTable, Service Broker Efficient Active Secondaries Read workloads Backups PowerShell Automation Fast Failover
An Availability Group Deployment Sync Log Synchronization Async Log Synchronization
AlwaysOn SQL Server HA/DR Technologies Availability Groups (for groups of databases) Pre-existent Increased Number of Server Secondaries failover Useful in consolidation scenarios Increased Availability of Shared storage (SAN / SMB) Readable Secondaries Depends on storage redundancy Add Azure Replica Failover takes minutes Wizard Server restart Multi-node instance Passive secondary nodes E n h a n c e d Failover Cluster Instances (for servers) Support for Windows Cluster Shared Volumes New Multi-database Failover DBs that app depends on Direct attached storage Log synchronization Failover takes seconds D i a g n o s t i c s Secondary replicas are online Multiple Secondary Replicas Active Secondary Replicas
Availability Groups Increased Number of Secondaries SQL Server 2012: Customers using (max 4) readable secondaries to offload read workloads Single technology to configure / manage Higher throughput (~7x) than Replication Customers asking for more replicas Reduce query latency in geo-distributed environments (e.g. Bwin) Scale-out read workloads (e.g. Baltika) SQL Server 2014: Max 8 secondaries Max 2 sync secondaries for high availability Secondary delay depends on network latency and I/O ~1s within data center ~5s between data centers
Availability Groups Increased Number of Secondaries Minimal performance impact Commits don t wait for async secondaries Log sender threads share log pool Added transaction latency of 8 async secondaries: <1% Scoped-out: Load balancing via connection string Read_Only connections still routed to first available readable secondary Load balancing possible via DNS round-robin or specialized load balancers (e.g. NLB)
Availability Groups Increased Readable Secondaries Availability SQL Server 2012: Read workloads killed during network failures Geo-distributed environments (e.g. failure/upgrade of network equipment, ISP failures) Hybrid (on-premise to Azure) deployments SQL Server 2014: read workloads not impacted during network failures.. or primary down.. or cluster quorum loss.. Readable secondaries remain available during Resolving state Requires direct connections to readable secondaries (Read-only routing not supported yet) Replica state and last commit time available in DMV/Dashboard
Availability Groups Increased Readable Secondaries Availability Sync Log Synchronization Async Log Synchronization
Availability Groups Increased Readable Secondaries Availability The increased readable secondaries availability means our users can still find answers online and the world keeps spinning - StackOverflow http://nickcraver.com/blog/2013/11/18/running-stack-overflow-sql-2014-ctp-2/
Availability Groups Increased Readable Secondaries Availability StackOverflow can now: Offload more critical read workloads to readable secondary in main data center Network glitches even within the same DC can happen Use readable secondary in DR site while data center is down (70% reads) Simpler to change DNS than force failover and failback Doesn t result in data loss
Demo Availability Groups Increased Availability of Readable Secondaries
Availability Groups Add Azure Replica Wizard Many customers can t afford a DR site Site rent + maintenance, hardware, Ops SQL Server 2012: Started supporting replicas on Windows Azure VMs this year Offload read workloads Offload backups (policy compliance) Disaster recovery At best region West US, East US, East Asia, Southeast Asia, North Europe, West Europe Latency / political considerations
Availability Groups Add Azure Replica Wizard Sync Log Synchronization Async Log Synchronization
Low TCO Availability Groups Add Azure Replica Wizard VM and storage Free ingress traffic Case studies Lufthansa, Thomson Reuters, Buffalo Hospital Supply SQL Server 2014: Add Replica Wizard supports Windows Azure E2E: From provisioning VM to starting log synchronization Validates environment Handles failures Does cleanup
Demo Screen Shots Availability Groups Add Azure Replica Wizard
Availability Groups & Failover Cluster Instances Enhanced Diagnostics 24 Enhancements on Dashboard, Error Messages, DMVs, XEvents Simplify troubleshooting & prevent issues Based on feedback from customers & CSS
Availability Groups & Failover Cluster Instances Enhanced Diagnostics Title Show timestamps in in XEL XEL output in in UTC UTC (not (not adjusted to to client client SSMS SSMS computer) Warning about log log synchronization behavior when when primary replica replica is async is async System function IsPrimaryReplica(database_name) Add AG name (and replica name and DB name if relevant) to many more XEvents to allow better data correlation between the logs Report major HADRON Manager transitions to AlwaysOn XEvent session Add Replica name context to connection established error log entry Component XEvents Viewer Dashboard System function XEvents XEvents Error Log Dump relevant output from from sys.dm_hadr_database_replica_states to to SQL SQL error error log log when replicas change to to resolving state state Add new error message to detect AG startup failure when quorum is forced XEvents Error Log Separate error error msg msg41142 (replica (replica can't can't become primary) primary) - raised - raised for for two two importantly Error Log different importantly reasons different reasons AlwaysOn Functions/DMVs should also support FCIs where applicable DMVs Improve the CREATE AG error message AG already exists, to say It s possible that a Error Message previous DROP AG operation, executed during cluster quorum loss, didn t delete the AG from the cluster. If so, please retry the DROP operation Remove FCI setup dependency on cluster.exe (deprecated) Use Powershell Error Log
Failover Cluster Instances Support for Windows Cluster Shared Volumes (Windows Server 2012 & 2012 R2) Cluster Shared Volume (CSV) Shared disk accessible to all nodes (over SMB) One or more per physical drive Failover Cluster Instances on CSV Improves SAN utilization Removes limitation of 24 drives Increases I/O resiliency Retry read/write via other nodes Increases failover resiliency Disks don t need to be unmounted/mounted
Failover Cluster Instances Support for Windows Cluster Shared Volumes
AlwaysOn and Windows Server Windows Cluster Enhancements Windows Server 2012 Dynamic Quorum Removes votes from unavailable nodes Enables last man standing Increased network resiliency Handle more exceptions Avoid node evictions Windows Server 2012 R2 Network names without Active Directory Avoid Listeners issues: permissions, collisions
סיכום: גירסא שניה של,Always ON מוסיפה: יציבות Features דיאגנוסטיקה מומלץ עם Windows Server 2012 R2 משתלב נהדר ב Azure
assaff@microsoft.com