610 likes | 799 Views
SQL Server 2008 R2 High Availability Technologies. Agenda. Overview & Benefits Increasing Availability Enabling Data Protection Improving Manageability Architectural References (Optional). Why Do You Need High Availability?.
E N D
SQL Server 2008 R2 • High Availability Technologies
Agenda • Overview & Benefits • Increasing Availability • Enabling Data Protection • Improving Manageability • Architectural References (Optional)
Why Do You Need High Availability? • Downtime leads to lost customer revenue and decreased staff productivity Availability during planned downtime • Patching and service pack installations • Hardware and software upgrades • System reconfiguration • Database maintenance • Application upgrade Protection against unplanned downtime • Human error is the number one cause of failure • Site disasters • Hardware malfunction • Data corruption • Software crash
Customer Success Stories INCREASING AVAILABILITY “Our applications are running at the 5 nines. We’re enjoying 99.999 percent uptime, which really helps to keep our business running.” -- Mediterranean Shipping Company Case Study EVIDENCE ENABLING DATA PROTECTION “We had a mirrored copy of the 2-terabyte database with a clean version of the page, and SQL Server 2008 restored the page in a matter of minutes – actually, before we even noticed the error message.” -- CareGroup Healthcare System Case Study EVIDENCE IMPROVING MANAGEABILITY “From Enterprise Manager in SQL Server 2000 to Management Studio in SQL Server 2008, Microsoft has always provided great database management tools. The Oracle tools, many of which are Java-based, just seemed more cumbersome to use.” -- Tyler Technologies Case Study” EVIDENCE
Microsoft SQL Server® 2008 R2 High Availability Technologies
SQL Server High Availability Licensing • License only active servers • Passive servers do not require a license • If failover occurs, a license is not required for 30 days • The most cost-effective high-availability model offered by the leading database vendors Comparison of high availability features of SQL Server editions
Agenda • Overview & Benefits • Increasing Availability • Enabling Data Protection • Improving Manageability • Architectural References (Optional)
Database Mirroring • Hot standby protects against database or server failure • Automatic or manual failover • SQL Server 2008 R2 does not require a database restart after manual failover • No shared components; two separate copies of data • SQL Server 2008 R2 compresses the log stream from principal server to mirror server • Automatic recovery from page corruption • SQL Server 2008 R2 supports rolling upgrades of mirrored database configurations
Enhanced Database Mirroring High Performance Mirroring • Increase performance through asynchronous mirroring Automatic Page Repair • Automatically detects page corruption and retrieves data from the mirror • Reduces downtime and management costs • Minimizes application changes to correctly handle I/O errors Reporting from Mirror • Increase utilization of mirror server • Reduce need for reporting servers Applications & Business Logic Principal Mirror “This is a really powerful enhancement because prior to this… you would have to run DBCC CHECKDB... and that would likely mean taking downtime… With SQL Server 2008 Database Mirroring you can avoid the effort and downtime.” — Glenn Berry, Database Architect, NewsGator Technologies
Failover Clustering • Server hardware redundancy • Using a shared disk subsystem • Entire instance virtualized and fails over as a unit • Can include non-SQL Server resources • Clustering can be combined with database mirroring, log shipping, or replication
SQL Server 2008 R2 Support for Windows Server® 2008 R2 Clustering
Multi-Instance Clustering • More than one passive node is available to host instances from multiple failovers on active nodes • Having multiple failover nodes provides greater availability • Multiple instances can share the same failover node, which reduces hardware costs • Simplified setup reduces administrative costs Applications & Business Logic 110010100101110010100101 110010 110010100101 110010100101 110010 110010100101 110010100101 110010 Active Failover Active Offline Active “Because of the critical nature of the G4S application, CASON sets up the servers in a failover cluster to ensure high availability.” —CASON Case Study
Live Migration in Hyper-V™ • Move running instances of virtual machines between host servers • Virtual machines can be moved for maintenance or to balance workloads on host servers • Perform maintenance on physical machines without any downtime • Requires Windows Server 2008 R2 Hyper-V Applications & Business Logic 110010100101 110010100101 110010 110010100101 110010100101 110010 110010100101 110010100101 110010 110010100101 110010100101 110010 110010100101 110010100101 110010 110010100101 110010100101 110010 110010100101 110010100101 110010 110010100101 110010100101 110010 Virtual Servers Virtual Server Host Server Host Server “This server already runs on our cluster solution with high availability, but after we have tested live migration on the new hardware, we’ll move it over to ensure optimal performance and reliability.” —Rodrigo Immaginario, IT Manager, Universidade Vila Velha EVIDENCE
Transactional Replication • High performance; latency measured in seconds • Minimal load on the server • Cost-effective because no specialized hardware is required • Straightforward setup and administration • Two types of transactional replication: • Standard transactional replication • Easy to design, set up, and manage • Subscriber (standby) can be used for reporting • Peer-to-peer transactional replication • Multi-master model; schema is identical on all sites • Supports distributed applications with data partitioning; enables load balancing • Conflict detection to handle multiple conflicting updates to the same record
Transactional ReplicationReplication Options Transactional Replication Reporting and Redundancy Peer-to-Peer Replication Query Scale-Out and Redundancy London New York Boston Shanghai New Jersey Seattle Tokyo
Peer-to-Peer Replication • Increases reliability by replicating data to multiple servers • Provides higher availability in case of failure or to allow maintenance at any of the participating nodes • Offers improved performance for each node with geo-scale architecture • Add and remove servers easily without taking replication offline, by using the new topology wizard Applications & Business Logic 110010100101 110010100101 110010 110010100101 110010100101 110010 110010100101100101100101100101 110010100101 110010100101 110010 “[Microsoft] SQL Server 2008 replication proved to be very predictable and reliable in our testing. This helps us to create flexible and scalable replication solutions. Reliability must be at the foundation of all that we do.” — Sergey Elchinsky, Leading System Engineer, Baltika Breweries
Logreader Agent Logreader Agent Logreader Agent Dist DB Dist DB Dist DB Distribution Agent Distribution Agent Distribution Agent Peer-to-Peer Transactional ReplicationHow does it work? “West” “East” “South”
Online Operations • Allow routine maintenance without downtime • Online index operations • Online page and file restoration • Online configuration of peer-to-peer nodes • Users and applications can access data while the table, key, or index is being updated 0 5 1 1 Applications & Business Logic 2 2 3 3 4 4 0 5 110010100101 110010100101 110010 110010100101 110010100101 110010 Index Table 5 Index Table Deleted 0 4 Deleted 2 Deleted Deleted 3 7 5 0 4 6 5 3 7 6 7 “We recommend performing online index operations for business environments that operate 24 hours a day, seven days a week, in which the need for concurrent user activity during index operations is vital.” — SQL Server Books Online
Partial Database Availability • Database remains partially available whensecondary files are damaged Primary file group Additional data file groups
Dynamic Configuration • Dynamically add memory and processors to servers without incurring downtime • Requires hardware support for either physical or virtual hardware Applications & Business Logic 110010100101 110010100101 110010 110010100101 110010100101 110010 110010100101 110010100101 110010 110010100101 110010100101 110010 110010100101 110010100101 110010 110010100101 110010100101 110010 110010100101 110010100101 110010 110010100101 110010100101 110010 110010100101 110010100101 110010 110010100101 110010100101 110010 110010100101 110010100101 110010 110010100101 110010100101 110010 110010100101 110010100101 110010 110010100101 110010100101 110010 110010100101 110010100101 110010 110010100101 110010100101 110010 110010100101 110010100101 110010 110010100101 110010100101 110010 110010100101 110010100101 110010 110010100101 110010100101 110010 110010100101 110010100101 110010 110010100101 110010100101 110010 110010100101 110010100101 110010 110010100101 110010100101 110010 110010100101 110010100101 110010 110010100101 110010100101 110010 110010100101 110010100101 110010 110010100101 110010100101 110010 110010100101 110010100101 110010 110010100101 110010100101 110010 “Hot-add CPU is the ability to dynamically add CPUs to a running system. Adding CPUs can occur physically by adding new hardware, logically by online hardware partitioning, or virtually through a virtualization layer.” —SQL Server Books Online
Resource Governor • Allocate CPU and memory resources to high-priority applications • Prevent runaway queries that hold resources for extended periods • Provide a consistent user experience, which can result in fewer service calls about slow systems • Establish service level agreements (SLAs) with customers for predictable response times 110010100101 110010100101 110010 110010100101 110010100101 110010 110010100101 110010100101 110010 LIMIT 50% POOL 0 LOAD 25% LIMIT 30% POOL 1 LIMIT 20% POOL 2 LOAD 15% LOAD 45% “Resource Governor allows us to control the percent[age] of total resources any operation can consume so that they don’t adversely impact our real-time data access.” —Michael Steineke, Vice President, Information Technology, Edgenet
Table Partitioning • Manage and access subsets of data quickly and efficiently • Reduce time spent troubleshooting storage allocation issues • Speed up data load and maintenance operations • Take advantage of all of the CPUs in the server to complete operations more quickly “Enhancements in partition query dramatically reduce the effects of lock escalation on systems that have to process hundreds and thousands of transactions per second, improving availability and improv[ing] query response time.” —Randy Dyess, SQL Server Mentor, TechNet Article
Backup • Permanent copy of data • Online restore • Bring database online as soon as primary file group is restored • Access restored portion of database while remainder is restored • Backup compression inSQL Server 2008 R2 • Reduce volume significantly • Faster restore and backup • Applied per instance and per backup • Compress UCS-2 Unicode data • Cost-effective ─ no specialized hardware is required • Straightforward setup and administration
Fast Database RecoveryDuring Restart or Restore • SQL Server 2000 • Database is available after Undo completes • SQL Server 2008 R2 • Database is available when Undo begins Undo Redo Time Database is available Redo Undo Database is available
Agenda • Overview & Benefits • Increasing Availability • Enabling Data Protection • Improving Manageability • Architectural References (Optional)
Database Mirroring Synchronous, high-availability configuration • Data is mirrored synchronously as part of a transaction • Automatic recovery of corrupt data pages • Compression of log stream Client Witness Principal Mirror
Log Shipping • Use time delay for replaying transaction logs on secondary server • Protect data against user errors on the primary server
Backup Media Mirroring • Mirroring a media set increases backup reliability by reducing the impact of backup-device malfunctions • Creating checksums on backup media enables verifying subsequent restore operations • Having a mirror can resolve some restore errors quickly by substituting the mirrored media for damaged backup media 1100101001010010 1100101001010010 1100101001010010 1100101001010010 1100101001010010 1100101001010010 1100101001010010 1100101001010010 1100101001010010 “The backup media mirroring feature of SQL Server enables you to perform a mirrored backup of a database to multiple backup devices, which greatly increases the reliability of backups in case of faulty media or a lost backup device.” —High Availability White Paper
Database Snapshots • Provide a read-only static view of the database at a point in time • Revert databases to a point in time before user error • Data loss is limited to changes after the snapshot • Run reports from a snapshot created on the mirror server in a database mirroring session to better utilize resources Applications & Business Logic 110010100101 110010100101 110010 110010100101 110010100101 110010 110010100101 110010100101 110010 Snapshot Source “Database snapshots allow you to create read-only databases for reporting and can also be useful in your data recovery efforts in the event of a disaster.” —Tim Chapman, SQL Server Database Administrator
Agenda • Overview & Benefits • Increasing Availability • Enabling Data Protection • Improving Manageability • Architectural References (Optional)
Database Mirroring • Cost-effective because no specialized hardware is required • Straightforward setup and administration • Automatic, transparent client redirect Client Witness Principal Mirror
Failover Clustering Enhancements • Easy to set up • Failover cluster snap-in • PowerShell support • Read-only APIs • Network prioritization • Enhanced security • Up to 16 nodes
Enhanced Management Tools • Database Mirroring Monitor • Cluster Validation Tool • Topology Viewer • Replication Monitor • Log Shipping Reports
Resource Governor • Control resource utilization • Per-request limits • Max memory % • Max CPU time • Grant timeout • Max requests • Real-time resource configuration and monitoring Backup OLTP Activity Executive Reports Admin Tasks Ad-hoc Reports High Admin Workload OLTP Workload Report Workload Min Memory 10% Max Memory 20% Max CPU 20% Max CPU 90% Admin Pool Application Pool
PowerShell • Powerful scripting shell • Automate server administration • Supports more complex logic than Transact-SQL • Supports other Microsoft servers • Built-in cmdlets – commands to specify specific SQL Server actions
Policy-Based Management • Define configuration policies • Apply to group of servers, database, table • Enforce policies in real time Targets Categories
Agenda • Overview & Benefits • Increasing Availability • Enabling Data Protection • Improving Manageability • Architectural References (Optional)
Customer Deployments • Failover cluster for high availability and asynchronous database mirroring for disaster recovery. • Synchronous database mirroring for high availability and disaster recovery. Log shipping for additional disaster recovery. • Geographically-dispersed cluster for high availability and disaster recovery. Log shipping for additional disaster recovery. • Failover cluster for high availability and SAN-based asynchronous replication for disaster recovery. • Peer-to-peer replication for high availability and disaster recovery (and reporting).
ServiceU Corporation • ServiceU • Software as a service (SaaS) provider • Provide solutions for reserved seat ticketing, box office management, event management and online payments • Customers in 50 states and 15 countries • Payment Card Industry (PCI) level-1 service provider (credit card compliance) • High availability and disaster recovery requirements • No service = no revenue • PCI requires the same security measures at the disaster recovery site; this needs to be set up prior to an emergency to meet the same strict guidelines • Goal: eliminate all single points of failure: network, servers, data, data centers • Case studies • http://www.microsoft.com/casestudies/Case_Study_Detail.aspx?CaseStudyID=49683 • http://www.microsoft.com/casestudies/ServeFileResource.aspx?4000003277
ServiceU Infrastructure Memphis Primary Data Center Atlanta Standby Data Center WEB FARM WEB FARM DNS DNS DB Connection to Memphis for Regular Test Exercise PRINCIPAL MIRROR Preferred Asynchronous Database Mirroring Windows Server 2008 and SQL Server 2008 Windows Server 2008 and SQL Server 2008
CareGroup Healthcare System • Four hospitals located in Boston • 16,000 employees • 146 mission-critical clinical applications • 2 million patient medical records • Annual revenue : $2 Billion • High availability and disaster recovery requirements for clinical databases: • Recovery time objective (RTO) : No downtime • Recovery point objective (RPO): No data loss • More later on service level agreement (SLA) for non-clinical databases • Case study: http://www.microsoft.com/casestudies/Case_Study_Detail.aspx?casestudyid=4000001003
CareGroupDatabase Classification and SLA • 70 databases rated “AAA” • RPO = 0 and RTO = 0 • Deploy clustering and synchronous database mirroring • Use EMC Symmetrix SAN • 200 databases rated “AA” • RPO =<1 hour and RTO = 1 hour • Deploy clustering and asynchronous mirroring • Use EMC Clarion SAN • Databases rated “A” • RPO and RTO = 1 day • Disaster recovery site being enhanced to match primary site infrastructure
CareGroup Infrastructure Applications: 1- SharePoint 2- SSRS 3- BlackBerry 4- Citrix Server 5- VMware VC Cisco Global Site Selector (GSS) DNS Connect to: Green\SQL1 Alias Name = Green IP: 100.10.56.30 IP: 100.85.3.10 SQL Server Cluster SQLNetworkNameA\SQL1 IP:100.10.56.30 SQLHostNameB\SQL1 IP:100.85.3.10 Principal Server Mirroring DR Site Mirror Server
bwin Corporation • Online gaming company • Gibraltar-based, listed on the Vienna Stock Exchange • Sports betting, soft games & skill games • 1 million bets per day on more than 90 sports • Offers audio and video streaming of major sporting events • Case study • http://www.microsoft.com/casestudies/Case_Study_Detail.aspx?casestudyid=4000001470
bwin Mission & Challenge • The Mission: Failure is not an option • Budget: Unlimited • Define: Very large database (VLDB) – A database that needs attention • Not just about its size • SLA • Zero data loss • 99.99% availability twenty four hours, seven days a week • Assumed worst case scenario: full datacenter failure with complete data loss within the datacenter
bwin Solution • The Solution • Standardize everything • Work by the book • Have some clever people available if the book runs out of pages • The Environment • More than 100 TB of data • More than 850 databases • More than 100 instances • More than 450,000 SQL statements per second
bwin Infrastructure • Principal server: 32 IA64 dual-core CPUs • Mirror server: 32 IA64 single-core CPUs • >64 core testing • 64 network ports (1 Gbps) • 400 local serial attached SCSI (SAS) drives on 16 RAID controllers (for the operating system, TempDB, and log files) • 16 host bus adapters (HBAs) for 256 disk / 256GB cache SAN system