Storage Network Designs for OLTP Business Continuity

Storage Network Designs for OLTP Business Continuity Marc Farley President, Building Storage Networks, Inc.

Agenda • The Vendor Neutral Approach • Overview of OLTP &High Availability • I/O Redundancy Methods • Storage Network Technologies • Storage Networking for HA OLTP

Vendor Neutral Approach • Generic terms, not vendor terms • Assumed basic knowledge of SAN, NAS, RAID

And now, for something completely different…..

OLTP Environments • Mission critical business applications • Business in real-time • Expensive equipment and software • Aggressive performance objectives • Highly skilled IT staff • Hands-on computing operations

OLTP Database Software • Oracle, • 8i Oracle Parallel Server (OPS) • 9i Real Application Cluster (RAC) • IBM • DB2 UDB • Informix • MS SQL Server • Sybase, My SQL, others

OLTP OS Platforms • IBM S/390 MVS • Unix Systems • Windows 2000+ • HA Linux

OLTP Requirements • 99.999% uptime • Non-degrading response time • High transaction rates • Seamless scalability • Cost relief

Database Storage Approaches • Raw parititions • Bypass OS I/O buffering • File system • Facilitates data management • NFS mounted • Offload DB server, NTAP + Oracle

ACID Properties of OLTP • Atomicity– No partial transactions • Consistency– All tables are in a consistent state before and after a completed transaction • Isolation– One transaction cannot contaminate other transactions • Durability–Transactions are complete only when the database updates are written to disk storage

Challenges of OLTP • Major systems integration effort • Intricate tuning and monitoring • Little tolerance for errors • Complex data structures & relationships • Time and sequence-sensitive processes • Must be adhered to for data integrity • Shifting workloads and bottlenecks

OLTP Database Files • Data files • Database data, tablespaces • Redo log files, archive log files • Reconstruct or rollback transactions • Control files • File layout information

OLTP Table Space Storage • Use many spindles to distribute hot spots • RAID 0+1 recommended • File system recommended over raw partitions • Easier data management

Striping for Performance RAID Controller (Microsecond performance) DiskDrive DiskDrive DiskDrive DiskDrive DiskDrive DiskDrive Disk Drives (Millesecond performance)From rotational latency and seek time

My Personal Favorite, RAID 0+1 RAID Controller DiskDrive DiskDrive DiskDrive DiskDrive DiskDrive DiskDrive DiskDrive DiskDrive DiskDrive DiskDrive 1 2 3 5 4 Mirrored Pairs of Striped Members

OLTP Redo Log Storage • Raw partitions recommended • Sequential high speed writes • Separate mirror pairs per log file group • Capacity for 30 – 60 minutes of data • Goal is to limit disk contention for current and active log files

OLTP Archive Log Storage • File system or NFS mounting is required • NFS mounting is recommended • Mirroring or RAID • Goal is to have easy access in case they are needed for reconstruction

High Availability • The ability for a system or application to immediately continue its mission after loss or damage to system components, systems, facilities and data

Expected Scaling limitations Processor Storage capacity Network Consolidations Product life cycles Unexpected Failures Bugs Virus Operator errors Disasters Availability Threats

HA Engages All Elements • Systems • Application • Network connections • Network services • Storage and I/O subsystems

Scoping the Risks

Managing the Risks • Local copies of data • Immediate availability • (Remote) Nearby • Immediate availability to several hours • Remote Far away • One to several days availability

Disaster/Availability Radii Local Remote Nearby Remote Far Away

Nobody Expects….. • Weird things to happen to them • Disintegration of media • Underground flooding through tunnels • Fires in Telco switching centers

High Availability for OLTP • Duplication of functions • Without degrading performance • Without risking data integrity • Brute force techniques • Automation and efficiency • Cost is always an issue • And high availability DOES cost

Jedi Jim Gast Marc Skyfaller Farley A Long Time Ago in a Job Not So Far Away……………. You must learn the to be a master of redundancy it if you are going to be a storage geek. Remember Marc, there is only one concept: REDUNDANCY! Redundancy. Again! Got it Jim. Let’s Eat! Whatever

Eventually, I Learned to Appreciate His Teachings…… • REDUNDANCYNSPoF (No Single Point of Failure) Don’t get the giant spicy Polish for lunch – its too much for the digestion

OLTP HA Requires Complete Redundancy Protection • Client network • Server systems and components • Application modules • I/O Channels and Networks • Storage subsystems and components • Data

A Quick Look At Clustered Storage Shared Everything Shared Nothing Both servers share control of a common storage address space Each server controls its own storage address space

Examples of OLTP Clusters Microsoft SQL Server Oracle 9.1 RAC Data is exchanged between servers Data is accessed directly from storage Failoverpaths only

One more time, with subsystems… Microsoft SQL Server Oracle 9.1 RAC All storage is shared by all cluster nodes Same subsystem but different address spaces

I/O Redundancy • Host to subsystem • Mirroring: Host to independent targets • Multi-pathing: Host to a single target • Subsystem to subsystem • Store and forward: • Local • Remote

Disk Mirroring: Redundant storage targets Independent, identically sized storage address spaces One controller Two controllers

Disk Mirroring: I/Os to 2 Targets • “Brute force” redundancy: fast and simple • Both read and write I/Os • Overlapped reads for performance • Local connections • Limited capacity* • I/O Bottlenecks* for random I/O activity • * if targets are disk drives

Disk Mirroring for Redo Log Files • Log files are a common bottleneck • Use raw partitions • Redundancy is required • Mirroring is adequate • Use highest RPM with lowest seek times • Put on a separate channel from database I/O • Use separate mirrored pairs per group

Mirroring to Storage Subsystems StorageSubsystem Independent, identically sized storage address spaces Two controllers StorageSubsystem

Mirroring to Subsystems • Targets are subsystems, not disks • Separate address spaces • Capacity scales to subsystem max • Double level redundancy • Mirroring plus RAID • Multiple disk spindles reduces I/O bottlenecks

Disk Mirroring Datafiles from Host to Storage Subsystems • Disk mirroring + subsystem RAID • Excellent capacity scaling • Adjacent and across campus/town • One subsystem outside site radius • Requires longer distance cabling • Reads and writes both transmitted

Multi-Pathing: Redundant Paths Between a Host & Subsystem X Application data volume Pathing software determines that a transmission error occurs & switches to a redundant path

Multi-pathing vs Mirroring • Mirroring assumes independent, but similar storage targets • Multi-pathing assumes multiple paths to the exact same target • Mirroring can use a single HBA, multi-pathing needs two HBAs

Path Failures 1 3 2 1. HBA problem Application data volume 2. Link, switch or network problem 3. Subsystem controller problem

I/O sent to storage No ack received Transmission failures recognized after SCSI timeouts are exceeded The I/Os is retried and eventually an error is passed back to the process that issued the I/O

Path Failover for OLTP I/O • Redundant path resources take over activities for a failed path to sustain operations without disrupting service or risking data integrity

Store and Forward Independent, identically sized storage address spaces Host A B

Store & Forward: One Host I/O and Two Copies of Data • Only real option for remote copies • Does not forward read I/Os • Proprietary protocols and methods • Standards are emerging ie. FC/IP • First step to storage snapshots

ACK ACK I/O I/O Forward Forward ACK Store and Forward: Acknowledgements Asynchronous Synchronous B B A A

Trade-offs withAcknowledgement Handling • Synchronous • Always preferred • Slowest performance • State of copy is precise • Asynchronous: • Fastest performance • Least precise knowledge of copy status

Store & Forward: Local and Remote Copies • Local & nearby copy techniques • Synchronous • Fiber optic cabling, optical/DWDM services • Remote-far away copy techniques • Asynchronous • ATM gateways, OC-12 or less, FC/IP

Mirroring Async I/O Reads and writes No snapshot tie-in Uses more host slots Least costly Store and Forward Async or Sync I/O Writes only Snapshot ready May conserve host I/O slots Most costly Mirroring vs Synchronous Store and Forward for Local & Nearby Copies

Combining Mirroring with Store and Forward Store and Forward Radius Local Nearby Remote Far Away Mirroring Radius

Storage Network Designs for OLTP Business Continuity

Storage Network Designs for OLTP Business Continuity

Presentation Transcript

Novell Solutions for Business Continuity

Business continuity:

BUSINESS CONTINUITY

Storage Network Designs for OLTP Business Continuity

Planning for Business Continuity

Business Continuity

Business Continuity

BUSINESS CONTINUITY

Business Continuity Planning for Hospitals

Business Continuity

Business Continuity

Business Continuity

Towards a Business Continuity Information Network for Rapid Disaster Recovery

Planning for Business Continuity

Business Continuity

Using GIS for Business Continuity

Business continuitY

BUSINESS CONTINUITY

Fujitsu Storage Solution for Business Continuity

OLTP

Business Continuity

Business Continuity