210 likes | 327 Views
Modern Distributed Systems Design – Security and High Availability. Measuring Availability Highly Available Data Management Redundant System Design. Measuring Availability. How resiliency and high availability are interconnected? Define downtime and what causing downtime.
E N D
Modern Distributed Systems Design – Security and High Availability Measuring Availability Highly Available Data Management Redundant System Design
Measuring Availability • How resiliency and high availability are interconnected? • Define downtime and what causing downtime. • How to meager availability?
Define Downtime • Downtime could be defined by following: “If a user cannot get his job done on time, the system is down”
What causing downtime? • Planned – ones that easiest to reduce that include scheduled system maintenance, hot-swappable hard drives, cluster upgrades and even failovers. Usually 30% of all downtime; • People or human factor – dumb mistakes and complex innovation in IT equipment, software and protocols requires greater knowledge of engineers. Usually 15 % of all downtime; • Software Failures - due to software bugs and viruses. (40%)
How to meager availability? MTBF Availability = ---------------------, where MTBF + MTTR MTBF – “mean time between failures” and MTTR - “maximum time to repair”
What can go wrong? • Hardware • Environmental and Physical Failures • Network Failures • Database System Failures • Web Server Failures • File and Print Server Failures
Levels of Availability: • Regular Availability • Increased Availability • High Availability • Disaster recovery • Fault-Tolerant System
Highly Available Data Management • Data management is the most sensitive area of modern distributed systems. • Quick overview of existing data topologies
Redundant System Design • Redundant storage (RAID, Multi-hosting, Multi-Pathing, DiskArray, JBOD, etc) • Failover Configurations and Management • Introduction to SAN and Fibre Channel protocol • Security aspects of data management in Storage Area Networks
Failover Configurations and Management Failover must meet following requirements: • Transparent to client; • Quick (no more then 5 min, ideally 0-2 min); • Minimal manual intervention, guaranteed data access.
Failover components: • Two servers, one primary another takeover; • Two network connections, third is highly recommended • All disks on a failover pair should have some sort of redundancy • Application portability • No single point of failure.
Security in IP Storage Networks • Security in Fibre Channel SANs • Security Options for IP Storage Networks
Fibre Channel SAN Security • Port or hard zoning • WWN Zoning • LUN Masking
Security Options for IP Storage Networks • iSNS • LUN Masking as in Fibre Channel and VLAN tagging • IP Security or IPSec • ACL