240 likes | 364 Views
An Examination of Cloud Storage Architectures for Scalable Internet and Cloud Computing Applications. Joel Christner (jec2160) COMS-E6125 Spring 2010 Web Enhanced Information Management Professor Kaiser. Agenda. What are Cloud Computing and Cloud Storage? Why Cloud Storage?
E N D
An Examination of Cloud Storage Architectures for Scalable Internet and Cloud Computing Applications Joel Christner (jec2160) COMS-E6125 Spring 2010 Web Enhanced Information Management Professor Kaiser
Agenda • What are Cloud Computing and Cloud Storage? • Why Cloud Storage? • How Cloud Storage Works • Comparison of Cloud Storage Architectures • Summary
Terms and Definitions • “General term for anything that involves delivering hosted services over the Internet. These services are broadly divided into three categories: Infrastructure-as-a-Service (IaaS), Platform-as-a-Service (PaaS) and Software-as-a-Service (SaaS).” • http://www.searchcloudcomputing.com • Generally accepted “public cloud” definition • Cloud storage is a component of cloud computing, and is an elastic, on-demand, scalable platform for data storage and retrieval
Architectures • Three forms of clouds (architectures): • Public cloud – infrastructure owned by an external entity, usage-based billing • Private cloud – infrastructure owned by the company themselves, chargeback • Hybrid cloud – public cloud with isolated resources and secure connectivity (virtual data center extension)
Attributes • Attributes of clouds: • Virtualized – abstract logical from physical resources to enable mobility (VM migration), abstract logical from physical access and integration points • Elasticity – virtualization enables elasticity (add or remove processors, memory, disk capacity) • Scalability – virtualization as abstraction via management and access middleware enables simplicity in dynamically adding or removing resources • Pay-as-you-Grow – built using commodity components (low cost) that can be added or removed as capacity needs change (virtualization and abstraction)
Computing Evolution • Computing has shifted over the last three decades: centralized -> distributed -> centralized Mainframes withdumb terminals Distributed computingand the workgroup Centralized andconsolidated data centers
Storage Evolution • Storage has followed this shift as well but remains the most costly element of any enterprise • Mainframe – costly, single-vendor, but completely monolithic yielding easiest data management and protection (single point) • Personal computers – cheap, modular, fully distributed, difficult to manage and protect data • Workgroup servers – cheap, modular, still distributed data, difficult to manage and protect • Centralized servers and storage networks – expensive, modular, simpler data management and protection
Storage Network Fabrics • Organizations are still struggling to consolidate their data (data is still distributed) but the following storage network fabrics are in use today: • Storage area network (SAN) – block volume access over a shared network (Fibre Channel, Internet SCSI, Fibre Channel over Ethernet) • Network attached storage (NAS) – filesystem protocol access over a shared network (Common Internet File System, Network File System, all of which use IP and generally Ethernet) • Contrast with DAS (directly-attached storage)
Storage Capital Cost Elements • Capital cost elements • Disk (in the workstation, in the server, in shared storage arrays) • Over-provisioned capacity (idle until used) • Storage array controllers (providing volume management and value-added capabilities over shared disk) • Storage network infrastructure (FC/Ethernet switches, HBAs/NICs, multipathing, failover software) • Data protection hardware and software (backup application, servers, tape libraries and automation, tapes) • Software licenses (snapshots, replication) • Vendors = high profit margin = high cost
Storage Operational Cost Elements • Operational cost elements • Real estate (storage systems are large) • Facilities (power, space, cooling) • Failed component replacement (tapes, drives) • Off-site storage (Iron Mountain) • Volume provisioning, allocation, resizing, data migration, and ongoing management • People (salaries, benefits)
Benefits of Cloud Storage? • Uses commodity components, eliminating the most costly elements of traditional storage capital costs • Virtualization and abstraction eliminate the most costly and time-consuming elements of traditional storage operational costs • Eliminates scalability issues associated with existing storage arrays (max drives, max capacity) • Public cloud storage enables pay-as-you-grow capacity • Private cloud storage enables chargeback • Hybrid cloud storage enables near public cloud storage cost with private cloud performance and security • Store virtually anything (user information, image files, documents, dynamic page structure, binaries, code files, anything) flexibly and at the lowest cost
Cloud Storage System Components • Access software • Integrated or binary (emulating SCSI) • Access via HTTP RESTful APIs or SOAP • Control servers • Core of the system with databases (or NoSQL) • Holds consumer authentication credentials • Manages registration/removal of metadata/storage servers • Management interface • Metadata servers • Stateless, caching to scale control servers • Consumer authentication, session key mgmt • Object location management, • Read/write request routing amongst storage servers • Storage servers • Handles read/write requests (GET/PUT/POST/DELETE)
Cloud Storage System Architecture 1..n Metadata Servers scale-out and statelessIO optimized for metadata Server Load Balancing ApplicationServers AuthenticationSession Keys N+N Control ServersHA, no scale-out Locate Object Read/Write Request Routing HTTP RESTful API SOAP API Read/Write 1..n Storage Servers scale-out Capacity optimized for data storage
Why More Scalable? • Metadata servers scale the control servers through caching where appropriate • Brick-based approach to adding IO or storage capacity – simply add more metadata servers or storage servers • SLB provides load-balancing, scale, and HA for metadata servers • Metadata servers provide load-balancing for storage servers • Storage servers may have a replication policy for data high availability (copy objects across storage servers)
Not Infinitely Scalable, But… • Traditional enterprise storage systems have a host of scalability limitations: • Number of trays behind the controller • Number of disks behind the controller • Number of connected hosts • Number of network interfaces • Number of configurable volumes, snapshots • Cloud storage system scalability is limited by: • Number of IOPS for the control server and offload percentage via metadata servers • Control server database capacity for metadata, object location • Number of metadata servers behind an SLB • IOPS capacity per metadata server • Storage capacity per storage server • In general, cloud storage is considered multiple orders of magnitude more scalable than traditional enterprise storage
Raw $/GB Comparison • Traditional midrange enterprise storage (such as EMC’s Clariion) averages approximately $8/GB in capital costs alone • Scales to hundreds of TBs
Comparable Capacity using Commodity Components • http://blog.backblaze.com/2009/09/01/petabytes-on-a-budget-how-to-build-cheap-cloud-storage/ • $8K – 67TB, or roughly $0.80/GB (1/10 cost) • Need more capacity? Add more bricks!
Challenges with Cloud Storage • Access methods • Enterprise applications expect SCSI access to underlying disk infrastructure and overlay block devices with their own file systems • Cloud storage systems expose capacity via programmatic APIs (RESTful, SOAP), requiring translation • Not an issue for home-grown applications • Security • Cloud storage systems, particularly in public clouds, do not encrypt data • Even if cloud provider encrypted data, data remains vulnerable due to chain of custody when cloud provider owns the key material • Others • Performance for raw block device access vs cloud storage systems is lacking, particularly in public and virtual private cloud scenarios due to WAN bandwidth, latency, and packet loss • Cloud storage systems use replication for high availability but provide no snapshots for enterprise backup systems
Summary • Cloud storage architectures decrease the capital and operational expenses of today’s enterprise and Internet businesses • Cloud storage eliminates the majority of complexity and limitations associated with traditional storage (capacity limits, data migration, volume management) • Cloud storage virtually eliminates the system-level scalability limitations associated with traditional storage • Cloud storage has a series of challenges that limit its applicability in existing application environments, but remains a good fit in homegrown application environments • Innovation in the cloud storage space will improve usability (translation appliances and software), security, and performance