220 likes | 244 Views
Storage. Storage 101. What is a storage array?. Agenda. Definitions What is a SAN Fabric What is a storage array Front-end connections Controllers Back-end connections Physical Disks Management Performance Future – Distributed storage. Definitions. SAN – Storage Area Network
E N D
Storage 101 What is a storage array?
Agenda • Definitions • What is a SAN Fabric • What is a storage array • Front-end connections • Controllers • Back-end connections • Physical Disks • Management • Performance • Future – Distributed storage
Definitions • SAN – Storage Area Network • This is generally used as a catch all term for all the following definitions • For storage personnel SAN does NOT equal storage array • LUN – Logical Unit Number, also known as a volume • WWN – World Wide Name • MAC address for storage networks • Fabric – Network that connects hosts to storage • iSCSI – Internet SCSI • SCSI – • FC – Fibre Channel • FCoE – Fibre Channel over Ethernet • FCIP – Fibre Channel over IP • Storage Array – Storage Device that provides block level access to volumes • DAS/DASD – Direct Attached Storage • Storage directly attached to a server without any network • NAS – Network attached Storage • Storage device that provides file level access to volumes • RAID – Redundant array of Independent Disks • A way to combine multiple physical disks into a logical entity providing different performance and protection characteristics.
What is a SAN fabric • A network comprising hosts, storage arrays they access, and storage switches that provide the network connectivity
SAN Fabric Details • A SAN Fabric has hosts that connect to the network • Each host has a physical connection and some logical addresses • pWWN (Port WWN) is the equivalent MAC address for the port on the host that is connected to the network • FCID is a dynamic address that represents the connection as well • Only HP-UX 11v2 and below use this • Typically hosts connect into some storage switch • These look like traditional network switches in many ways and operate the same way. • These switches will contain both host ports and storage ports, or in the storage world, initiators and targets • Storage arrays that provide storage also connect into these switches to provide the full network
What is a storage array? • A storage array is a system that consists of components that provide storage available for consumption • The components are front-end ports, controllers, back-end ports, and physical disk drives
Front-end connections • Front-end connections are used for individual hosts to connect to the storage array and utilize the volumes available • This can be directly connected in a small or medium size SAN, or in a DAS environment • The physical transport mechanism can be fibre or copper • The logical transport protocols can be block level protocols such as iSCSI, FC, or FCoE • Some arrays also support file level protocols as well such as NAS devices • The larger arrays tend to have more front-end connections to aggregate bandwidth and provide load balancing • Volumes are typically presented via one or more front-end connections to hosts
Controllers • Controllers are the brains that translate the request from the front-end ports and determine how to fulfill the request • Controllers run code optimized for moving data and performing mathematical calculations needed to support RAID levels • Controllers also have a certain amount of on-board memory, or cache, to help reduce the amount of data that has to come from spinning disks. • Many arrays perform some level of read-ahead caching and write caching to optimize performance • They also have some diagnostics routines and management in order to support the operations of the array.
Back-end connections • From the controllers themselves to the physical disk shelves or disks there are back-end connections. • These send actual commands to the disks commanding them to retrieve or write blocks of data. • These connections are usually transparent to all but the most sophisticated storage engineer. • Often times these have specific fan-out ratios where each disk shelf may have two or four connections and split the bandwidth available in some way. • Back-end connections are rarely a bottleneck
Physical Disks • These days physical disks come in all shapes and sizes • Spinning drives come in capacities of anywhere from 146GB to 3TB, with the space increasing year over year (though not performance) • These drives also come in various rotational speeds anywhere from 5400 RPM in a laptop drive to 15000 RPM in an enterprise class drive, which directly affects performance • Non Spinning drives, also known as SSD’s, come in capacities that don’t yet match spinning drives, though there are SSD cards that have up to 960GB of storage space available. • These physical disks directly impact the performance of the storage array system, and are usually the bottleneck for most enterprise class storage systems.
Provisioning • Provisioning storage is a multi-step process • Configure the host with any software including multi-path support • Alias the host port WWN • Zone the host port alias to a storage array WWN • Activate update zone information • Create host representation on storage array • Create volume on storage array • Present/LUN Mask volume to correct host • Format volume for use
Performance • There are many statistics you can use to monitor your storage devices, however there tend to be two key ones that directly impact performance more than most. • IOPS – Input/Output Operations Per Second • This is based on the number of disks that support the volume being used and the RAID level of the volume • 15k RPM disks provide 200 IOPS raw without any RAID write penalty • Raid 1 has a 1:2 ratio for writes. For every 1 write command sent to the array, 2 commands are sent to the disks. • Raid 5 has a 1:4 ratio, while Raid 6 has a 1:6 ratio • Read existing data block, Read Parity 1, Read Parity 2, Calculate XOR (parity) is not I/O, Write data, Write Parity 1, Write Parity 2 • Read commands are always 1:1 • For an application that has a requirement of 10,000 IOPS and a 50/50 read to write ratio on a raid 6 volume: • 5,000 read IOPS, translating into 25 physical disks • 5,000 write IOPS translating into 30,000 back-end operations requiring 150 physical disks • Total requirement is 175 physical disks just to support the performance needed! • Bandwidth • This is based on the speed of the connections from the host to the array as well as how much oversubscription is taking place within the SAN Fabric.
Performance • Bandwidth • This is based on the speed of the connections from the host to the array as well as how much oversubscription is taking place within the SAN Fabric. • Fibre Channel currently supports 16Gb full duplex, though 8Gb is more common • That’s 3200 MBps in each direction, transferring 3GB of data each second in one direction or 6GB of data bi-directionally. • FCoE currently supports 10Gb, though the roadmap includes 40Gb and 100Gb • 10Gb is 2400 MBps in each direction, while 100Gb is 24000 MBps, 23.4GB per second! • Besides the speed is the matter of oversubscription
Performance • Oversubscription – The practice of providing less aggregate bandwidth than the environment may add up to • In an environment with 100 servers having dual 8Gb FC connections we’d have a total of 1600Gb that is directed at a storage array via some SAN switch • The storage array may only have a total of eight 8Gb FC connections for 64Gb aggregated bandwidth • We have a ratio of 1600:64 or 25:1. • This is done in networking all the time and is now a standard in the storage world. • The assumption is that there will never be a need for all 100 hosts to be transmitting 100% of the time their full bandwidth
Storage Futures • Converged Infrastructure • Datacenters designed today talk about converged infrastructures • One HP Blade enclosure can encompass servers, networking, and storage components that need to be configured in a holistic manner • Virtualization has helped speed this convergence up, though organizational design is usually still far behind. • Storage arrays are beginning to support target based zoning • The goal is to reduce the administration needed to configure a host to storage mapping letting the storage array do more intelligent administration without human intervention
Storage Futures • Over the last few years storage has begun transitioning from “big old iron” to distributed systems where data is spread across multiple nodes for capacity and performance. • EMC Isilon • HP Ibrix • Nutanix • Vmware VSAN • Nexenta • As always in IT, the pendulum is swinging back to the distributed platforms for storage where each node hosts a small amount of data instead of a big platform hosting all of the data.
Storage Futures • Data protection is maturing from traditional RAID levels such as 1, 1+0, 5, 6, etc • RAID levels do offer additional protection however don’t protect against corruption most of the time • RAID levels also have performance implications that are usually negative to the applications residing upon them • These days the solution is to create multiple copies of files or blocks based upon some rules • Most of the large public cloud providers use this solution including Amazon S3, or simple storage service • It just so happens by default anything stored in S3 has three copies! • The ‘utopia’ world is a place where each application has some metadata that controls what protection level and performance characteristics are required • This would enable these applications to run internally or externally yet provide the same experience regardless. • This is the essence of SDDC, Software Defined Data Center. The application requirements will define where they run without any intervention.