560 likes | 722 Views
Storage Management. W.lilakiatsakun. Storage Technology. JBOD (Just Bunch Of Disk) RAID (Redundant arrays of inexpensive disks) ESS (Enterprise Storage System) SSA (Serial Storage Architecture). JBOD (Just Bunch Of Disk) (1). JBOD (Just Bunch Of Disk) (2).
E N D
Storage Management W.lilakiatsakun
Storage Technology • JBOD (Just Bunch Of Disk) • RAID (Redundant arrays of inexpensive disks) • ESS (Enterprise Storage System) • SSA (Serial Storage Architecture)
JBOD (Just Bunch Of Disk) (2) • Depending on the Host Bus Adapter a JBOD can be used as individual disks or any RAID configuration supported by the HBA. • Concatenation (SPAN) • Concatenation or Spanning of disks is not one of the numbered RAID levels, but it is a popular method for combining multiple physical disk drives into a single virtual disk. • It provides no data redundancy. As the name implies, disks are merely concatenated together, end to beginning, so they appear to be a single large disk.
JBOD (Just Bunch Of Disk) (3) • it consists of an array of independent disks, it can be thought of as a distant relative of RAID. Concatenation is sometimes used to turn several odd-sized drives into one larger useful drive, which cannot be done with RAID 0. • For example, JBOD (Just a Bunch Of Disks) could combine 3 GB, 15 GB, 5.5 GB, and 12 GB drives into a logical drive at 35.5 GB, which is often more useful than the individual drives separately.
Redundant arrays of inexpensive disks (RAID) • The organization distributes the data across multiple smaller disks, offering protection from a crash that could wipe out all data on a single, shared disk. • Benefits of RAID include the following • Increased storage capacity per logical disk volume • High data transfer or I/O rates that improve information throughput • Lower cost per megabyte of storage
RAID0 (stripe set or striped volume) • RAID Level 0 splits data evenly across two or more disks (striped) with no parity information for redundancy. • It is important to note that RAID 0 provides zero data redundancy. • RAID 0 is normally used to increase performance • A RAID0 can be created with disks of differing sizes, but the storage space added to the array by each disk is limited to the size of the smallest disk
RAID0 – Summary (1) • RAID 0 uses a very simple design and is easy to implement with a HUGE performance advantage. • I/O performance is greatly improved by spreading the I/O load across many channels and drives while the best performance is achieved when data is striped across multiple controllers with only one drive per controller.
RAID0 – Summary (2) • No parity calculation overhead is involved • Not a "True" RAID because it is NOT fault-tolerant. The failure of just one drive will result in all data in an array being lost.
RAID1 (mirrorring) • A RAID 1 creates an exact copy of a set of data on two or more disks. • This is useful when read performance or reliability are more important than data storage capacity. • Such an array can only be as big as the smallest member disk. • A classic RAID 1 mirrored pair contains two disks which increases reliability
RAID1 – Summary (1) • RAID Level 1 requires a minimum of 2 drives to implement. • For highest performance, the controller must be able to perform two concurrent separate Reads per mirrored pair or two duplicate Writes per mirrored pair. • 100 redundancy of data means no rebuild is necessary in case of a disk failure, just a copy to the replacement disk. • Transfer rate per block is equal to that of a single disk. • Simplest RAID storage subsystem design.
RAID1 – Summary (2) • Highest disk overhead of all RAID types - inefficient due to the duplication of Write tasks. • Typically the RAID function is done by system software, loading the CPU/Server and possibly degrading throughput at high activity levels. • Hardware implementation is strongly recommended. • May not support hot swap of failed disk when implemented in "software".
RAID 0 +1 (A Mirror of Stripes) • RAID Level 0+1 is implemented as a mirrored array whose segments are RAID 0 arrays. • RAID Level 0+1 requires a minimum of 4 drives to implement
RAID 0 +1 – Summary (1) • RAID 0+1 provides high data transfer performance. • It also has the same fault tolerance as RAID level 5. • RAID 0+1 has the same overhead for fault-tolerance as mirroring alone. • The high I/O rates are achieved thanks to multiple stripe segments. • RAID 0+1 provides excellent solution for sites that need high performance but are not concerned with achieving maximum reliability.
RAID 0 +1 – Summary (2) • A single drive failure will cause the whole array to become a RAID Level 0 array. • It has a high overhead and is very expensive. • All the drives must move in parallel to proper track, thereby lowering sustained performance. • It has very limited scalability at a very high inherent cost.
RAID 10 (A Stripe of Mirrors) • RAID 10 is implemented as a striped array whose segments are RAID 1 arrays. • RAID Level 10 requires a minimum of 4 drives to implement.
RAID 10 – Summary (1) • RAID 10 has as the same fault tolerance as RAID level 1 and can achieve the same high I/O rates. • It has the same overhead for fault-tolerance as mirroring alone. • It provides an excellent solution for sites that would have otherwise gone with RAID 1 but need some additional performance boost. • Very expensive with a high overhead. • All drives must move in parallel to proper track lowering sustained performance. • Plus it has a very limited scalability at a very high inherently cost.
RAID3 (Parallel access with a dedicated parity disk) • RAID Level 3uses byte-level striping with a dedicated parity disk. • This comes about because any single block of data will be spread across all members of the set and will reside in the same location. • So, any I/O operation requires activity on every disk.
RAID3 – Summary (1) • Level 3 only requires one dedicated disk in the array to hold parity information. • The server's data is then striped across the remaining drives, usually one byte at a time. • The parity drive then keeps track of all the info on the striped drive(s) and uses it to restore info if the drive should fail. • Because of the parity information that is stored and because Write operations take place on a byte level, Read/Write operations often take longer than other RAID configurations.
RAID3 – Summary (2) • RAID Level 3 requires a minimum of 3 drives to implement. • Very high Read data transfer rate. • Very high Write data transfer rate. • Disk failure has an insignificant impact on throughput. • Low ratio of ECC (Parity) disks to data disks means high efficiency.
RAID3 – Summary (3) • Transaction rate equal to that of a single disk drive at best (if spindles are synchronized). • Controller design is fairly complex. • Very difficult and resource intensive to do as a "software" RAID because of the parity generation and checking
RAID5 (Independent access with distributed parity) • A RAID 5 uses block-level striping with parity data distributed across all member disks. • A minimum of 3 disks is generally required for a complete RAID 5 configuration. • In the example, a read request for block "A1" would be serviced by disk 0. • A simultaneous read request for block B1 would have to wait, but a read request for B2 could be serviced concurrently by disk 1
RAID 5 – Summary (1) • Level 5 also relies on parity information to provide redundancy and fault tolerance using independent data disks with distributed parity blocks. • Each entire data block is written onto a data disk; parity for blocks in the same rank is generated on Writes, recorded in a distributed location and checked on Reads. Compared to RAID 3, RAID 5 uses striping to spread parity information across multiple drives. • Requirements: RAID Level 5 requires a minimum of 3 drives to implement.
RAID 5 – Summary (2) • It has the highest Read data transaction rate and with a medium Write data transaction rate. • A low ratio of ECC (Parity) disks to data disks means high efficiency along with a good aggregate transfer rate. • Disk failure has a medium impact on throughput. • It also has the most complex controller design. • It's often difficult to rebuild in the event of a disk failure (as compared to RAID level 1) and individual block data transfer rate same as single disk.
SSA (Serial Storage Architecture) (1) • Serial Storage Architecture (SSA) defines a high-performance serial link for the attachment of input/output devices. • It has been optimized for storage applications such as hard disk drives, host adapter cards, and array controllers. • SSA has many advantages over existing parallel interfaces such as the Small Computer Systems Interface (SCSI-2). • It uses compact cables and connectors, and it has better performance, connectivity, and reliability. • However, to facilitate migration, SSA retains much of the SCSI-2 logical protocol. • Current SSA implementations such as the IBM 7133
SSA (Serial Storage Architecture) (2) • Disk Subsystem provide a peak data rate of 20 MB/s in each direction. • However, a typical loop configuration with one host adapter can provide a total sustained bandwidth of up to 73 MB/s, and higher speeds are becoming available. • The physical medium is usually a copper cable up to 20 meters long, but fiber optics can also be used for longer distances.
SSA (Serial Storage Architecture) (4) • Architecture overview • SSA is defined in three layers: • SSA-PH1 defines the electrical specifications, cables, and connectors. • SSA-TL1 is a general-purpose transport layer. It defines the transmission protocol, configuration, and error recovery. • SSA-S2P is a mapping of the SCSI-2 queuing model, command set, status, and sense bytes.
Storage Area Network • The Storage Network Industry Association (SNIA) defines the SAN as a network whose primary purpose is the transfer of data between computer systems and storage elements. • A SAN consists of a communication infrastructure, which provides physical connections; and a management layer, which organizes the connections, storage elements, and computer systems so that data transfer is secure and robust.
SAN ‘s definition • A SAN is a specialized, high-speed network attaching servers and storage devices • It is sometimes referred to as “the network behind the servers.” • A SAN introduces the flexibility of networking to enable one server or many heterogeneous servers to share a common storage utility, which may comprise many storage devices, including disk, tape, and optical storage.
SAN Component • SAN Connectivity • the connectivity of storage and server components typically using Fibre Channel (FC). • SAN Storage • TAPE /RAID /JBOD (Just Bunch of Disk) /SSA (Serial Storage Architecture) • SAN Server • Windows /Unix /Linux and etc
Switched Fabric • An infrastructure specially designed to handle storage communications called a fabric. • A typical Fibre Channel SAN fabric is made up of a number of Fibre Channel switches. • Today, all major SAN equipment vendors also offer some form of Fibre Channel routing solution, and these bring substantial scalability benefits to the SAN architecture by allowing data to cross between different fabrics without merging them.
Fiber Channel protocol • Fibre Channel is a layered protocol. It consists of 5 layers, namely: • FC0 The physical layer, which includes cables, fiber optics, connectors, pinouts etc. • FC1 The data link layer, which implements the8b/10b encoding and decoding of signals. • FC2 The network layer, defined by the FC-PI-2 standard, consists of the core of Fibre Channel, and defines the main protocols. • FC3 The common services layer, a thin layer that could eventually implement functions like encryption or RAID. • FC4The Protocol Mapping layer. Layer in which other protocols, such as SCSI, are encapsulated into an information unit for delivery to FC2.
Storage Management • Monitoring disk use • Disk monitor agent scans the server volumes to collect disk use information • Hierarchical storage management • Files will be archived according to certain criteria • Prevention against Data Loss • To protect and recovery from loss • Outsourcing storage management
Monitoring disk use • One or more the following categories of information can be collected • Volumes: Date and time data was collected, server name, volumes scanned, capacity, total space used and available • Directories: Date and time data was collected, server volume and directory names, creation date and time, file count directory size (in bytes), owner name, groups to which owner is a member • Directory and file owners: Date and time data was collected, server and volume names, groups to which owner is a member, total number of files, total space used
Hierarchical storage management • When disk space becomes exhausted , data files need to be backup (as archived file or back up tape) • With the right tools, user are assured of having enough disk space to accommodate new files • When a file system reaches a predefined threshold of X percent full, • automated procedure are initiated that determine which files are eligible for archive and are currently backed up • The file catalog is then updated to indicate that files have been archived and deletes them from the disk file system
Prevention against data loss (1/2) • Backups sent off-site in regular intervals • Includes software as well as all data information, to facilitate recovery • Create an insurance copy on Microfilm or similar and store the records off-site. • Use a Remote backup facility if possible to minimize data loss • Storage Area Networks (SANs) over multiple sites make data immediately available without the need to recover or synchronize it
Prevention against data loss (2/2) • Surge Protectors — to minimize the effect of power surges on delicate electronic equipment • Uninterruptible Power Supply (UPS) and/or Backup Generator • Fire Preventions — more alarms, accessible extinguishers • Anti-virus software and other security measures
Techniques and technology • Mirroring • Disk mirroring : Redundant arrays of inexpensive disks 1 (RAID1) • Server mirroring: web / ftp /email • RAID : RAID0 – 6 and combination • On-site data storage • Back up - Tape / optical disk • Off-site data storage (backup-site) • Cold sites • Warm sites • Hot site
Mirroring • Mirroring can occur locally or remotely. • Locally means that a server has a second hard drive that stores data. • A remote mirror means that a remote server contains an exact duplicate of the data. The second drive is called a mirrored drive. • Data is written to the original drive when a write request is issued and then copied to the mirrored drive, providing a mirror image of the primary drive. • If one of the hard drives fails, all data is protected from loss.
Disk mirroring (RAID1) • The replication of logical disk volumes onto separate physical hard disks in real time to ensure continuous availability, currency and accuracy. • A mirrored volume is a complete logical representation of separate volume copies
Server mirroring • Mirror sites are most commonly used to provide multiple sources of the same information, and are of particular value as a way of providing reliable access to large downloads. • Web server • To preserve a website or page, especially when it is closed or is about to be closed • Load balancing • Email server • To protect loss of email information • ftp server • To allow faster downloads for users at a specific geographical location • Load balancing
Back up site • A backup site is a location where a business can easily relocate following a disaster, such as fire, flood, or terrorist threat. This is an integral part of the disaster recovery plan of a business. • A backup site can be another location operated by the business, or contracted via a company that specializes in disaster recovery services. • In some cases, a business will have an agreement with a second business to operate a joint disaster recovery facility.
Cold Sites • A cold site is the most inexpensive type of backup site for a business to operate. • It provides office spaces to operate • It does not include backed up copies of data and information from the original location of the business, nor does it include hardware already set up. • The lack of hardware contributes to the minimal startup costs of the cold site, but requires additional time following the disaster to have the operation running at a capacity close to that prior to the disaster.
Warm Sites • A warm site is a location where the business can relocate to after the disaster that is already stocked with computer hardware similar to that of the original site, but does not contain backed up copies of data and information.