470 likes | 490 Views
OpenStack Swift 101 Technology & Architecture Albert Chen – Systems Engineer Eric Rife – Senior Systems Engineer. April 25, 2016. Who Are We ? Why A re W e H ere ?. New Breed of Object Storage Deflation in the storage market Huge growth of AWS S3 storage
E N D
OpenStack Swift 101Technology & Architecture Albert Chen – Systems Engineer Eric Rife – Senior Systems Engineer April 25, 2016
Who Are We? • Why Are We Here? • New Breed of Object Storage • Deflation in the storage market • Huge growth of AWS S3 storage • Software-Defined & Commoditization
Agenda What Is an Object? Why Use Swift? What’s Under The Hood? Use Case • Backup • Big Data / Map Reduce • Media & Entertainment • Life Science Hands on Lab
Object eh? • What is an object??
What is an Object File: 15x6x3_10x00x41741x64561_21x58x511_n.jpg Metadata: Image size: 1516x2048 Date taken: 2013-12-27 13:19 Tags: Asia, Japan, Bamboo forest, Arashiyama GPS: 35.016520, 135.670436 Object = File + Metadata
Object eh? • What can I do with it?
What is for Dinner? Metadata Search: Taiwanese Beefy Noodle Metadata info is stored with: Object Searchable Index This allows for meta data search
What is for Dinner? Metadata Search: Taiwanese Beefy Noodle
But Why Swift? • I like objects but what does Swift have to do with it?
Why you need Object? Traditional - File-based Object Storage - HTTP Namespace Mobile Devices App Large File / Big Data Software-as-a-Service Apps App App
Right Tool for the Right Job • Application-scale • Site-scale • Web-scale
Examples of Object Storage: Object Storage Services: Amazon S3 Rackspace Cloud Files HP Cloud Object Storage IBM Softlayer Object Storage Object Storage Systems: OpenStack Object Storage System (“Swift”)
Why Choose OpenStack Swift & Swift API Swift API accessible via HTTP Applications can consume storage from anywhere Larger range of functionality Puts the developer in control Swift Object Storage superior to Traditional Storage Designed to scale from TB -> PB -> EB Replication is automatic across Nodes, Zones, and Regions Supports both Replicas & Erasure Coding No Single Points of Failure -> Lose Nodes, Racks or Data Centers Ingress / Egress data from all Proxies - No Masters Balance Heterogeneous Commodity Hardware
What’s Under the Hood? • Take a closer look at the characteristics and different components of Swift
Access Method: RESTful http API Object Storage API Operations for Objects: http Load-Balancer http http Object Storage Node Object Storage Node There are also API operations for Accounts and Containers.
Every Object Has a URL Base URL API Version http://example.com/v1/account/container/object • Namespaces used to group objects within an account • Containers are unlimited • Like folders, but can’t nest them • Each account has its own URL • Swift is multi-tenant • Each object is addressed as a URL • Users name the object • Objects are not organized based on hierarchy • Instead, object names may contain “/”, so pseudo-nested directories are possible.
Swift High-Level Architecture Load Balancing/Authentication Proxy Account / Container / Object Replication and Consistency Standard Hardware
Load Balancing | SSL | Authentication Swift Architecture Proxy • Load Balancing • Requests are load balanced across all nodes running proxy server processes • SSL • Optionally, SSL termination can be enabled to encrypt data in flight • Authentication • OpenStack Swift has a pluggable authentication system • Modules include API/UI-driven, LDAP, AD, Keystone Account | Container | Object Replication | Consistency Standard Servers with Disks
Load Balancing | SSL | Authentication Swift Architecture • Proxy • Only part of the cluster that “talks” to external clients • Primarily with HTTP RESTful Swift API • Routes requests from clients to disk • Three replicas are simultaneous written • Quorum required • Users single replica for reads • Routes around failures • Enforces ACLs set by user Account | Container | Object Proxy Replication | Consistency Standard Servers with Disks
Load Balancing | SSL | Authentication Swift Architecture • Account / Container • Accounts keep records of containers • Containers keep records of objects • Object • The object servers store the data on disk • Metadata is stored with the data • Uses standard filesystem (XFS) Proxy Replication | Consistency Account | Container | Object Standard Servers with Disks
Load Balancing | SSL | Authentication Swift Architecture • Replication • Constantly checking for replicas status • Only updates other replica sites, does not pull in newer versions of objects • Consistency • Constantly ‘scrubbing’ data to check for bad data • Note: These processes run in the background on Nodes where account, container, or object server processes are running. Proxy Account | Container | Object Replication | Consistency Standard Servers with Disks
Load Balancing | SSL | Authentication Swift Architecture Proxy • Standard Servers • Runs on standard server hardware • SATA or SAS disks • No RAID • Visibility to hardware beneath • Swift is already providing data redundancy • Reduce cost Account | Container | Object Replication | Consistency Standard Servers with Disks
Hardware General Purpose • HP SL 4540 4U 60-bay (3 server nodes) • SuperMicro STX-CL XE36-2460 36-bay • Cisco UCS C3260 62-bay (2 server nodes) Backup & Archive (dense) • HP SL 4540 4U 60-bay (1 server nodes) • Cisco UCS C3160 62-bays (1 server node) • Dell DSS7000 4U 90-bay (2 server nodes) • SuperMicro CSE-946ED-R2kJBOD 90-Bay (enclosure) • Seagate 5U 84-bay (enclosure)
Load Balancing | SSL | Authentication Swift Architecture Summary Proxy • Native HTTP API • Scales Linearly • No Single-point of Failure • Standard Servers and Linux • Extremely Durable • Resilient to hardware failures • Routes around network failures • Consistency Model Enables Multi-Data Center Account | Container | Object Replication | Consistency Standard Servers with Disks
Multi-Region and Global Clusters • Storage in every corner of the world with a single pane of control
Regions and Zones Regions Physically separate, often defined by geographical boundaries Minimum: 1 Region Zones Regions contain one or more zones Designate a group of nodes sharing a set of physical hardware Also referred to as failure domains Multi-Region Clusters Tolerate failures across regions and zones Policies control placement across regions Send requests to “closest” region Region 1 Region 2 Load Balancer Load Balancer Global DNS Proxy Node Proxy Node Proxy Node Proxy Node Replication Network Object Nodes Object Nodes Object Nodes Object Nodes Object Nodes Object Nodes Account and Container Nodes Account and Container Nodes Object Nodes Object Nodes Account and Container Nodes Account and Container Nodes Zone 1 Zone 1 Zone 2 Zone 2
Data Placement: “As Unique as Possible” • Large Cluster • Single Node Cluster • Storage Racks are “as-unique-as-possible” • Disks are “as-unique-as-possible” • Multi-Region • Small Cluster • Storage Nodes are “as-unique-as-possible” • Distributed data centers are “as-unique-as-possible”
Storage Policies Benefits Optimizes storage for applications and users Consolidates storage tiers under one system Simplifies management Lowers storage infrastructure TCO Policies Encompass: Storage media Number of replicas Erasure Codes / Replicas Common Policy Groupings: According to performance According to geography or political boundary According to data protection scheme, e.g., number of replicas or erasure coding • Region1 Policy • Region 2 Policy Zone 1 Zone 1 Zone 2 Zone 2
MRC: Write Affinity Proxy Write Affinity • By default, objects are written to all the locations simultaneously. • Write Affinity: writes all copies locally then transfer asynchronously to other regions. Region 2 Region 3 Region 1
MRC: Write Affinity Proxy Write Affinity • By default, objects are written to all the locations simultaneously. • Write Affinity: writes all copies locally then transfer asynchronously to other regions. Region 2 Region 3 Region 1
MRC: Read Affinity Proxy Read Affinity • Prioritizes “Nearby” • Zone/Region, Where am I? • Latency to Storage Node • DNS Routes User • Each proxy pool has it’s own hostname • Routes user to closest region Region 1 Region 2 Global DNS
Cool, What can I do with it? • What are some situation I really should be using Swift?
Use Case – Backup Backup • Enterprise: Workstations • MSP: Cloud backup service Behavior • Write optimized • High throughput • Low concurrency Configuration • PACO nodes (Replicas or Erasure Code) • High density • Seagate 5U 84-bay enclosure • Dell DSS7000 4U 90-bay server • SuperMicro SC946ED 90-bay server Partners • Netbackup • CommVault
Use Case – Big Data / MapReduce Hadoop / Spark Philosophy: Let HDFS do what it’s best at: Serving data where you want it when you need it. SwiftStack for warm and cold data storage. Why? • SwiftStackdurability and reliability guarantees • SwiftStack managed capacity: Grow into the petabytes and beyond • Easier integration with various data input sources • Share results using a common storage platform How? • Run MapReduce job • Read input data from SwiftStack • Write transient results to HDFS • Write result to SwiftStack • Hadoop-OpenStack (formerly SwiftFS): https://hadoop.apache.org/docs/current/hadoop-openstack/
Use Case – Media & Entertainment Media encoding / ingest • Streaming data translation • Middleware: Storlets (IBM) • https://github.com/openstack/storlets Content Streaming • Default behavior in SwiftStack Large objects • Divide media files into 15 or 30 second time periods • Static Large Object • Add custom content / commercials
Use Case – Life Science Content repository High throughput • Many laboratory instruments writing at once • Gene Sequencers, mass spectrometers, etc. • Up to 1GB/min per instrument High capacity (starting at 1 PB+) Durability • Auditor ensures data remains consistent / not corrupted Availability • CDN-like data distribution Auditing • Object Versioning and tagging
Workshop:OpenStack Swift • Let’s try this out
LAB: Installing SwiftStack • https://files.swiftstack.com/workshoplabmanual.pdf • https://files.swiftstack.com/workshopfullmanual.pdf
Using Swift • I got it installed, let’s use it
SwiftStack Web Console In order to do simple operations, and list containers and objects in a graphical representation, SwiftStack provides the Web Console.
SwiftStack Web Console In the current version of the Web Console you can perform the following actions: • Connect to other accounts • Create containers • Upload objects • Download objects • Copy objects • Move objects • Delete objects
Other Interesting Sessions Ancestry.comin Production with OpenStack and KubernetesWednesday, April 27, 11:00am-11:40am Jordan Nielsen https://www.openstack.org/summit/austin-2016/summit-schedule/events/8826 Doubling performance in Swift with no code changesWednesday, April 27, 2:40pm-3:20pm David Stewart • John Dickinson https://www.openstack.org/summit/austin-2016/summit-schedule/events/8312 Monitoring Swift ++ (inclNagios, Elasticsearch, Zabbix, & more)Thursday, April 28, 9:50am-10:30am Martin Lanner • Adam Takvam https://www.openstack.org/summit/austin-2016/summit-schedule/events/8818 Swift 102: Beyond CRUD - More real demosThursday, April 28, 11:50am-12:30pm Doug Soltesz • Clay Gerrard https://www.openstack.org/summit/austin-2016/summit-schedule/events/8792
Other Interesting Sessions How Burton Snowboards is carving down the OpenStack trailThursday, April 28, 1:30pm-2:10pm Mario Blandini • Jim Merritt https://www.openstack.org/summit/austin-2016/summit-schedule/events/8841 FILE this: Swift API, then S3 API, and now POSIX access to OpenStack SwiftThursday, April 28, 2:20pm-3:00pm Joe Arnold • John Dickinson https://www.openstack.org/summit/austin-2016/summit-schedule/events/8704
I want to know more… Detailed overview and guides: http://swiftstack.com/openstack-swift/ Swift All-In-One Step-by-step Deployment Guide http://docs.openstack.org/developer/swift/development_saio.html API docs:http://api.openstack.org/api-ref-objectstorage-v1.html For contributors:#openstack-swift on freenode IRC SwiftStack Free Trial https://swiftstack.com/try-it-now/ OpenStack Swift: Using, Administering, and Developing for Swift Object Storage (ISBN-13: 978-1491900826 ) By Joe Arnold Book Signing April 26 • 10:45am - 11:15am – A25 April 27 • 01:15pm - 01:45pm – A25 April 28 • 12:45pm - 01:15pm – A25
Thank You! Albert Chen achen@swiftstack.com Eric Rife erife@swiftstack.com