160 likes | 301 Views
CHECO Fall 2013 Conference ERP Storage – What Works, What Does Not Presenter: Rick Beck, Director, IT Application Services September 17, 2013. Background. Previous environment RedHat Linux Database Servers 32GB RAM EVA SAN with fiber channel at 8Gb throughput
E N D
CHECOFall 2013 ConferenceERP Storage – What Works, What Does NotPresenter: Rick Beck, Director, IT Application ServicesSeptember 17, 2013
Background • Previous environment • RedHat Linux Database Servers 32GB RAM • EVA SAN with fiber channel at 8Gb throughput • 5 total production Oracle databases (Banner, ODS/EDW, help desk, Luminis, Web CMS,) on 3 servers • Reason for change • Servers were 6 years old; storage 5 years old and at capacity • Slowdowns during peak usage (first week of school; first day of timed registration) of 16
New Environment • Virtualized with Oracle Virtual Machine (OVM) and Oracle Linux on 3 physical Servers • 64GB RAM dedicated to production Banner database. • Failover with DataGuard to a standby database. Not active. • HP P2000 SAN (small business SAN) • 24 - 3TB drives at 7200 rpm • 3 solid state 3TB drives local on the server • 2 Vdisks • Configured with 512GB luns and raid 10
The Players • Infrastructure team – storage purchase and setup • Database director and DBAs • Oracle contractor with experience of 55 conversions to OVM and ASM
Implementation • HW purchased in June 2012 • Project Kickoff with Contractors – October 2012
Staffing Issues • Departures before the implementation began: • the Database Director • 2 of the 3 DBAs (3rd DBA left before first production go live) • The person who architected the storage solution
Plan Issues • Production environments were to be done first and then clone to test after Prod go lives. • No consideration made for middle tier: • Jobsub server, 10 application servers • Plan changes added additional $119k to the project.
SAN Issues • HP P2000 needed drivers to work with OVM • Solid state drives could not be mixed on the SAN with other drives and were traded in.
Other Issues • Firewall issues because new database servers were on different subnet than application servers • Took significant time to troubleshoot problems as Firewall was always a consideration. • System Admins did not receive training on OVM until just before Production Go Live
Results • Instead of 5 fold increase in speed, system was sluggish, with IO intensive activities running up to 5 times slower!!
Mitigation: Analysis • Consultant reviewed logs to determine if Oracle or OVM configuration issues were to blame. • Changes resulted in minimal performance increases • All analysis pointed to storage performance • OEM and Quest Spotlight showed that system slowdowns during times of peak OLTP and batch job processing were due to I/O waits.
Mitigation: Actions Taken • Cabinet had 24 more bays, so 24 additional drives purchased: 15000 rpm -600GB • Configured 2 Vdisks: • 4.8 TB as raid 10 – used for Banner Prod • 3 TB raid 5 (needed the space) – used for warehouse database (ODS/EDW) • ASM disk sizes were set to 1.99TB max as that was all that ASM could handle. • Use ASM to move the database files to the faster disk. (while system was up)
Mitigation: Results • I/O waits no longer a problem. • Survived Fall term startup with best Banner performance in years. • ODS/EDW data warehouse still has load speed issues.
Lessons Learned • Need broader involvement in planning: • Solid state was supposed to be local disk – people who knew had left. • Middle tier issues would have surfaced earlier. • Needed a vendor call to get Oracle OVM, HP and implementation contractor to discuss technical issues before project startup
Lessons Learned (Continued) • Cutting corners may not end up saving money. • Need to develop internal knowledge of unfamiliar technologies • New DBAs did not have OVM and ASM experience and are just now getting up to speed. • Oracle ASM can only handle 1.99TB virtual disk sizes.
Questions? Rick Beck beckr@msudenver.edu