200 likes | 224 Views
Managing Explosive Data Growth. EDUCAUSE 2008. PRESENTER : Ronan Glynn. Case Study: MSKCC. Established in 1884 9000+ Employees Regularly rated #1 US Cancer Hospital In 1948, Sloan-Kettering Institute formed 1500+ Employees SKI Computing Group formed in 2003. Our User Environment. 45% PC.
E N D
Managing Explosive Data Growth • EDUCAUSE 2008 PRESENTER: Ronan Glynn
Case Study: MSKCC • Established in 1884 • 9000+ Employees • Regularly rated #1 US Cancer Hospital • In 1948, Sloan-Kettering Institute formed • 1500+ Employees • SKI Computing Group formed in 2003
Our User Environment 45% PC 55% Mac
2003: New Research Building Announced • Space for 100+ new labs • New Graduate School • All documents and materials to be online • At the time, many labs stored research locally
2003-2005: Growth Spurt • Scaled from 0TB-5TB-15TB • Architecture is FC disk in high-end enterprise array • Tape backup sets with off-site rotation • Still in process of migrating data online
2003-2005: Growth Spurt • Meet with labs & analyze their data usage patterns • Evaluate Vendor hardware, data sizes and projected growth • Learn about user tolerance for clients & “extra-clicks” • Storage budget is drawn up and analyzed
Results of User/Data Analysis • Data is 99% Un-Structured • Used typically in a 1 year cycle • Use after 3 years is extremely low • Users do not know how much storage they need • Users want to minimize workflows • Data redundancy concept not being grasped
Unique Situations • Labs survive on grant funding. If a grant runs out, the lab leaves MSKCC • 2 data sets need to be copied: 1 for MSKCC, and 1 for the departing lab • This created a need for near-line storage • Some lab equipment unsuitable for data storage, other labs have specific need for ultra-fast data writes • VIP users had to be catered for differently
Research automation accelerates data growth Genescan Device High-Throughput Screening
Data Growth vs. Drive Capacity Growth • Storage growth is outpacing storage budget growth by 150% per year for SKI • Drive capacity increases not keeping pace with data growth • Most apparent with Enterprise-class drives like FC, SCSI & SAS
Searching for a solution • Must be Mac compatible • Client installs not feasible • Macs cannot use DFS, have issues with file locking and resource forks • No login scripts possible • Data archives a necessity
2006: New Building Opens • Zuckerman building opens • 23 stories (4 floors below ground). 558K sq. feet. • Deepest bedrock excavation in history of Manhattan (75 feet) • New labs bringing 1-5TB with them • Innovative Storage Tiering in place
Data Storage Environment < 9 months 9 mo. - 3 yrs > 3 yrs.
Storage Environment Details • TIER 1(50TB):EMC Celerra NAS Array w/ FC & high density SATA • TIER 2(32TB):EMC Centera: ATA based CAS solution. Auto-archives from tier 1 based on file age policy • TIER 3(10TB):Powerfile A3 Archiver. NAS enabled optical jukebox that utilizes 200 50GB ejectable Blu-Ray discs. • Unique Situations:Fast data writes handled by Isilon clustered NAS solution. VIP needs satisfied by back up televaulting using “Time Machine” and “Backup PC”. Time Machine is built into OSX 10.5, data is written to DAS storage attached to OSX Server. Backup PC is an open source backup solution for Windows & Linux, data is written to a share on Tier 1 storage array.
Powerfile Blu-Ray Archiver • WORM long term archiving • Stores 50GB data on a Dual Layer disc • Low power & cooling costs • Blu-Ray now industry HD standard • Disc size set to grow to 100GB
Isilon: Main Campus Storage • Main Campus storage required for labs requiring near real-time data writes to disk • Virtualized storage accommodates rapidly evolving storage growth • Isilon clustered NAS solution selected to meet these labs needs, with 33TB usable space • Fast data writes on private VLAN satisfies demand for high throughput on data copies
Backup PC • Free Linux based solution • Schedule set by admin console • Backup by hostname • User can have their own restore interface • Compression & pooling features
Time Machine • Account created on OSX Server • Point Time Machine to Mac Server • Incremental backups every hour (system files can be excluded) • Users set schedule and restore files themselves