390 likes | 595 Views
Data Storage and RAID Today. Brandon Krakowsky Jeffrey Doto. Presentation Topics:. Who Relies on Data Storage? Why is data storage so important? Sarbanes-Oxley and HIPAA. Hard Disk Failure. What is a RAID? Different types of RAID and their uses. Enterprise vs. Consumer Storage.
E N D
Data Storage and RAID Today Brandon Krakowsky Jeffrey Doto
Presentation Topics: • Who Relies on Data Storage? • Why is data storage so important? • Sarbanes-Oxley and HIPAA. • Hard Disk Failure. • What is a RAID? • Different types of RAID and their uses. • Enterprise vs. Consumer Storage. • Demonstration.
Information Overload • What the heck is an exabyte? • 1 billion gigabytes • The world generated 161 exabytes of digital information last year • IDC estimates that this will grow to 988 exabytes in 2010 • almost 1 zettabyte! • 185 exabytes of storage available last year • IDC estimates that this will grow to 601 exabytes in 2010 • We need more storage!
Proliferation of the Internet • How many web pages are there? • If you “Google” anything, you’ll get at least a billion choices • Web pages used to be just text and graphics • Now, audio & video clips are prevalent • Hosting companies need to deal with data storage on a whole new level
On-Demand Audio & Video • What about companies who specialize in On-Demand audio/video delivery? • YouTube • Google Video • They make it so easy to upload content • How do these companies deal with managing all of this data?
Digital Audio • Remember Napster? • Who buys CDs anymore? • What about companies who provide downloadable audio content? • iTunes • Rhapsody • MP3.com • Most of these companies provide video as well! • Also, Podcasting & Vodcasting are becoming more popular
Photographic Content • Everybody is a photographer these days! • Camera Phone • Digital Camera • Hosting companies allow users to upload photos easily • Flickr • Photobucket • Where are all these photos stored?
Database Driven Applications • Database driven websites rely heavily on data integrity • Companies like Amazon, eBay, and Citizens Bank all have huge backbones • They rely on storage! • National Security Agency has a database of phone call records of “tens of millions” Americans • Blogs & Wikis • Is this data backed up? • Can you imagine if you lost your MySpace account?
E-Mail • Most popular mode of communication • When you send a message, where does it go? • If you’re like most, e-mail is a lifeline • For large companies, email backup is a must!
Sarbanes-Oxley and HIPAA • George W. Bush signed into office in 2002 in the wake of the Enron scandal. • Changed the way publicly-held businesses were responsible for data retention. • Enormously profitable for storage industry.
Financial Impact of SOX • Estimated annual compliance spending up to $17-28.8 billion • Great for storage industry • Data retention: • Net Effect: Double the length retention and number of copies = a lot more storage! • Source: The Economist March 4th, 2004, Information Week, March 2, 2006
SOX: A Boon and A Burden. • While it has been a great source of financial gain for storage and IT vendors, it has been a huge headache for CIOs and IT staff. • Estimated man hours: countless. • New York Times: dedicated 200 employees in 2003, 105 full time on compliance project. • Washington Post: Spent $5 Million on outside consultants, created 10 full time positions. • CISA: Certified Information Systems Auditor • Source: Information week, March 2, 2006; Business Matters, March 2005.
HIPAA: Human Insurance Portability and Accountability Act • Signed into office in 1996. • Desired effect was to promote EDI, or Electronic Data Interchange among various healthcare bodies. • Protect Patient Privacy • Protect Security of Patient Information
Data Management for the User • As a user, why do I care? • Where do you store all of that music you illegally downloaded? • Again, sites like YouTube and Flickr allow you to upload your own media content • Where do you store all of your home-grown movies? • How do you backup your photo library? • Hard drives fill up fast!
Microprocessor Technological Advances • As microprocessor technology improves, so does memory size • How does this benefit overall computer performance? • It doesn’t unless secondary storage progresses at the same rate • Increased microprocessor speed opens the door to newer processor-intensive applications • Users need more space
Magnetic Disk Technology • MTTF: Mean Time to Failure. • Not a question of “will it fail, but when will it fail”. • Current drives run at speeds from 5,400 to 15,500 RPMs. • Electromechanical parts: spindle motor, actuator arm both prone to failure; magnetism can wear out. • Discuss enterprise vs. consumer storage later.
RAID • “Redundant Array of Independent Disks” • First proposed in the paper, “A Case for Redundant Arrays of Inexpensive Disks”, published in 1988 • Method of combining several disk drives into one “Logic Unit Number” (LUN) • Appears as a single storage unit to the host system
2 Most Important Features • Reliability • RAID makes use of “redundancy” • Data is redundantly distributed over all or some of the disks providing fault tolerance and data protection • Performance • Disk performance is enhanced because multiple disks are working in parallel
RAID Level 0 • No Redundancy • Uses a technique called “striping” • Data is broken down into blocks • Each block is written to a separate disk • Provides excellent write performance • Data is spread out • No data protection • If one disk fails, they all fail
RAID Level 1 • Uses a technique called “mirroring” • All data is written to at least two separate disks • If one disk fails, there’s a copy • Provides 100% data protection • Write performance is compensated • All data is written twice • Read performance is better than RAID 0 • Data can be read from multiple disks at once
RAID Level 2 • Uses a technique similar to “striping” • Words are split at the bit level • Each bit is written to a separate disk • Hamming codes are generated for each word • Spread across separate Error Correcting Code disks • Data is cross-referenced with codes to insure data integrity • Write performance is compensated since Hamming codes need to be calculated each time • No commercial implementation • Too expensive
RAID Level 3 • Uses a technique called “bit-parity interleaving” • Words are split at the bit level • Each bit is written to a separate disk • Parity bits are generated for each word • Stored on a separate parity disk • Read and write performance is compensated since all the disks are used for every operation
RAID 4: Block Interleaved Parity • Writes data in blocks instead of bits. • Advantage: high read performance. • Disadvantages: Dedicated Parity Drive causes severe write bottleneck, requires complex hardware controller. • Requires 3 disks to implement.
RAID 5: Block Interleaved Distributed Parity • Solves RAID 4 bottleneck. • Parity distributed over all drives. Allows multiple read / writes which increases efficiency. • Advantage: most versatile overall; file, web, database, internet servers all can use. • Disadvantage: requires a complex controller. • Requires at least 3 disks to implement.
RAID Level 6: Block interleaved Striping with Dual Error Protection • Advantage: Implements both Parity (P) Reed-Solomon Codes (Q) to protect against multiple drive loss. • Can think of as an extension to RAID 5. • Disadvantage: requires more complex controller with high overhead; requires N+2 disks.
Hybrid RAID: X+Y vs. Y+X • RAID 0+1 : • Mirror Striped Set: minimum of 4 drives = $ • Good for imaging / general file server / an area where highest reliability not a concern. • RAID 1+0: • Striped Mirror Set: minimum of 4 drives = $ • Good for databases. • RAID 5+1: • Mirrored RAID 5 for the truly paranoid.
RAID Z • Uses 128-Bit ZFS file system from Sun’s Solaris OS 10 • Available on OSX Leopard • Advantage: OS calculates parity, no need for external controller, can correct mistakes impossible to correct in RAID 5. • Disadvantage: Could take a performance hit if storage close to full.
Enterprise vs. Consumer Storage • Enterprise quality storage requires much more engineering • Environment plays a big role: • Chassis vibration, humidity, volatile solvents, heat, constant use…
Enterprise vs. Consumer Storage Seagate Barracuda 7200 RPM 250 GB SATA II Drive $75.00 Seagate Cheetah 15,500 RPM 147 GB SCSI Drive $1,100 SATA Connector 80 Pin SCSI Cable
Demonstration • Old Sun Software based RAID unit. • Employs Fibre-Channel Connection. • Houses 22 SCSI disks. Hard Drive Demonstration. See arm move over disk while writing large file.