110 likes | 231 Views
“Virtually Unstoppable” July 25, 2007 Ken Fell CIO New York Independent System Operator. Introduction.
E N D
“Virtually Unstoppable”July 25, 2007Ken FellCIONew York Independent System Operator
Introduction By mid-2006, the NYISO was averaging < 2% utilization across nearly 500 UNIX servers and suffering from growing power, cooling and space requirements. The financial situation was equally as bleak. Enter virtualization. By mid-2008, the NYISO will realize a 500% increase in server utilization, a 25% reduction in server footprint and a savings of $6M+.
About the NYISO • The NYISO manages the electricity transmission grid and wholesale energy markets for all of New York State • 19.2 million customers • $50B in transactions since 1999 • Not-for-profit organization • 400 employees, 100 in IT • Considered a technology company by board of directors
Goals • The overarching business goals included: • Reducing TCO • Increasing IT agility • The goals of virtualization included: • Increasing server utilization • Reducing data center power, cooling and footprint • Reducing cycle time for new environments • Standardization of platforms and technologies
Technologies • In the beginning… • 120 - Sun B100 blades servers • 38 - Sun V480 mid-range servers • 158 total physical servers • Running Solaris 8 or Solaris 9 supporting TIBCO and BEA application servers • Production and Development/Test servers segregated by subnet and host • Note: Test environment segregated for performance, integration, unit and user acceptance testing • The transition… • 9 - Sun V890 mid-range servers • 1 V890 – Dev/Unit Testing (35 Containers) • 1 V890 – Integration Testing (48 Containers) • 1 V890 – Shared for Production DR/Performance DR (27 Containers) • 3 V890s – Performance and Availability Testing (49 Containers) • 3 V890s – Production and DR (39 Containers) • Total of 198 Virtual Machines on 9 Physical Servers • Hitachi Enterprise Storage • Enterprise Storage was selected to allow for portability of the Solaris Containers. The design made provisions to allow for a Performance Test server to be placed quickly on the production subnet and attached to the production Hitachi Disk in the event of a catastrophic server failure. This allowed us to select more cost effective server hardware with fewer swappable parts. • Veritas Storage Foundation Suite • Veritas File Systems were selected to assist with the portability of the Solaris containers using the advanced import and export capability of Veritas.
Technologies • Other factors… • This effort presented an opportunity to combine application upgrades with the server migration. During these upgrades, we were able to provide additional environments for our standard release management processes. • Current Status… • We have migrated all of our TIBCO infrastructure and 50% of our BEA infrastructure to Solaris containers. • We are seeing peak loading of less than 30% of system utilization • Plans are in place to virtualize our Tru64 environment and we are looking to using HP-UX on Itanium • Phase 2 and 3 of virtualization is under way • Includes our Windows platforms using VMWare and Oracle platforms using IBM LPARs • Estimated reducing 180 Windows servers to less than 30 servers • Oracle and Tru64 platform reduction from 160 Tru64 servers to less than 10 HP-UX servers
Key Decisions • Selecting the right platform was important • The NYISO could have selected either Solaris or AIX to begin its virtualization efforts • Solaris was selected due to perceived ease of administration and greater expertise with the OS • Selecting the right hardware was important • The NYISO could have selected high or low-end hardware for implementation • Low-end hardware was selected due to reduced costs and virtualization clustering advantages
Challenges • Solaris containers presented unexpected networking challenges • Multiple default routes not supported well across zones • ISV licensing models were immature • Software licenses could not be limited to a zone • Educating business units when “they want their own server” • Project teams needed to migrate to availability, performance and capacity requirements versus hardware requirements • Migrating to new hardware and implementing virtualization simultaneously presented a “expand to contract” scenario
Critical Success Factors • Engaging vendors early helped avoid costly design mistakes • Engaged Sun during initial architecture of the project; held at least 3 design meetings with Sun for requirements gathering and to propose several implementation scenarios • Limiting the scope of the initial migration greatly improved the project’s chances of success • Virtualization touches the entire IT organization • We limited scope to one tier of our n-tier environment • By limiting the scope we could concentrate on migration, development and testing efforts w/o stressing a resource constrained organization and minimizing visibility to our business • Creating a business case helped demonstrate value and generate support from other business units • The business case was instrumental in providing positive visibility to this effort and gain the support of our Board of Directors and CEO • This visibility took our strategy from proof of concept to one of the top 5 initiatives at the NYISO • The CIO was and is an important champion and facilitator of the effort • By supporting this effort, we were able keep priorities focused and eliminate delays
Results • By mid-2008, over $6M in hardware and software savings will be realized • By 2010, the total savings will be nearly $11M • By mid-2008, total server count will be reduced from 450+ to 300 • Cycle time for new environments has gone from 2 weeks to 45 minutes • Unfortunately, administrative overhead has not been reduced
Lessons Learned • Virtualization requires proper planning, and it’s not a panacea • The complexity of a virtualized environment can eliminate any cost savings you expect from reduced administration • The flexibility of the Solaris containers caused us to develop several iterations of deployments; which in turn caused significant support and training issues • Standardize when possible • Be sure to involve vendors • We did not keep Sun engaged during the entire project, causing delays in getting critical patches to solve a technical patching flaw • Be sure to get commitment from the business