160 likes | 330 Views
Tivoli Storage Management at Monash University 2 nd May 2006 Presentation for TUGANZ By Tony Cataldo. Topics Covered. Who am I? Monash Uni CV TSM @ Monash Uni Pre-August 2004 Post-August 2004 Issues Along the Way Future Acknowledgements/References Questions.
E N D
Tivoli Storage Management at Monash University 2nd May 2006 Presentation for TUGANZ By Tony Cataldo
Topics Covered • Who am I? • Monash Uni CV • TSM @ Monash Uni • Pre-August 2004 • Post-August 2004 • Issues Along the Way • Future • Acknowledgements/References • Questions
Monash University CV • Established by Act of Parliament in 1958, first students in 1961 1 • Campuses in VIC – Berwick, Caulfield, Clayton, Gippsland, Parkville, Peninsula • International – Malaysia, South Africa • International Presence – London (UK), Prato (Italy) • Approx. 52,400 student enrolments and 5,952.5 (2,821.9 academic & 3,130.6 general) staff as at 31st March 2005 2 • Email provided to all students and staff • 230 staff dedicated to providing IS services to students & staff • ITS/Infrastructure Services: 85 staff • Notting Hill Pub – 1km from my office (walking distance) 1 History from Monash Uni web (http://www.monash.edu.au/about/history.html) 2 Statistics from Monash Uni web (http://www.monash.edu.au/about/stats.html)
TSM @ Monash Uni • Implementation primarily for corporate data and business critical applications; • Email – for both students & staff • MyMonash (web portal) • WebCT (learning portal) • Callista (student admissions/admin) • SAP (Corporate ER) • Approx. 350+ servers (Solaris8/9/10, Linux-RH, WINxx, Netware)
TSM @ Monash Uni cont. • Lots of time, resources, people & money been committed to this project • 15 people completed TSM training (3/5 days training • estimate 90 - 150* coffee & donuts consumed during this time.(* calc assumes: 15 people x 2 donuts x 3 or 5 days) • Non Supported Platforms (VAX VMS, SGI IRIS, WIN98, Solaris 7) by Monash.
Pre-August 2004 • A year or two prior Monash Uni had a tender in the market for • Disk Storage Facility (DSF) • Backup/Restore Facility (BRF) • Submissions/presentations from all major IT vendors • These were evaluated against approximately 100 criteria (corporate, technical, price, risk) of various weighting • Shorted listed vendors invited to present again, then reference sites visited. • POC in early 2004 • Existing 24 TBs of SATA storage
People Resources • Infrastructure Services (approx. 85 staff in total) • Production Facilities: group size 9 • 1 person full time – Cyrus • 4 operators (rotating shifts 8am – 10pm) • 1 Data Center Mgr – Chris B • 1 Manager/PM – Steve W • Network: group size 25 • 2 people that primarily manage fibre installation & configuration of 4 Cisco 9505 – two fabrics blue & red (soon to be 6 switches – Research fabric) • Shared Systems: group size 21 • 3 people in Data Services Team: manage backup (TSM & Legato), SAN (FC & SATA with Virtualisation) & HSM – Stuart L, George, Tony • 15 Sys Admins – who manager servers on day-to-day basis on behalf of client groups • Number of staff involved in initial POC (Russell K, Richard, George, others)
TSM System Resources (split over 2 data centers) • Servers • 2 x SUN Microsystems v440 (1 @ each site) • 8GB RAM, 4 x 36GB Internal hdd, 4422 GB SAN hdd • 4 HBA (2 JNI – for hdd, 2 Sun/QLogic for tape library) • Storage Array • 2 x IBM DS4500 with FC hdds (RAID 5) • 26.1 TBs of FC hdd (nearly fully utilised) • 2 x InfoTrend JBOD SATA with FalconStor/IPstor Virtualisation software • 24 TBs of SATA storage implemented (approx. 75% utilised) • DR Servers • 2 x SUN Microsystems v440 (1 @ each site) • 8GB RAM, 4 x 36GB Internal hdd • 1 being used as “TPC for Data” machine
TSM Logical Diagram [draw diagram on whiteboard/flip board
Post-August 2004 • Have migrated 320 of 350 servers to new backup infrastructure. • Callista, SAP, Novell and Library server migration has been completed. • Currently storing an average of 22 TBs per week on tape • Data protection and DR resilience is improved as data on tape is replicated in both silos.
Issues Along the Way • Issues with Solaris server with large # of small files • use of snapshots (scripting with Storage Array) • Issues with servers with over 4 million files on Netware server • 6-8months for patch(es) from IBM/Novell. • Critical incident with IBM (TSM & SAN) • Lack of understanding/culture/comms b/n companies. Confusion b/n hardware/software support & contracts. • New group of IBM people involved in resolving issues. • Lack of LTO2 drives in libraries • Reclaimation was frequently killed or switched off. Now have additional drives. • Faulty LTO3 drive caused the tape library (IBM 3584) to become unavailable to TSM. • Drive zero was replaced, after much anguish.
Issues Along the Way cont. • Solaris 9/RDAC combination only allows maximum 32 LUNs, with 2 logical TSM systems on each server, limited to 15 LUN/TSM server. • Converting stg LUNs from 101GB to 201GB. • Initially disk storage pools used then combined with file storage pools. • In the process of re-configuring LUNs and TSM storage pools to improve TSM threading (read/write) to hdd pools.
Future • Storage Resource Management (TPC for Data) • Migration of all LTO2 drives & tapes to LTO3 drives & tapes • Support for AFS on Linux (RH) • In SAN space currently testing iSCSI • Will be conducting a review storage virtualisation this year (currently using IPstor on SATA storage)
Acknowledgements/References • Acknowledgements • Steve White; various ppt & discussion • Cyrus Khavar/Chris Bourke; discussion & answers • George Scott/Stuart Lamble – discussion, answers, coverage • Russell Keil – discussion on POC, SAN diagram • References • www.monash.edu.au