1 / 28

My Research Experiences on Computer Performance Optimization

My Research Experiences on Computer Performance Optimization. Shih-Hao Hung, Ph.D. Sun Microsystems Inc. Confidential Information – Do Not Forward. Computer Performance. Demands for high performance: Getting jobs done faster Getting more jobs done at the same time

keiji
Download Presentation

My Research Experiences on Computer Performance Optimization

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. My Research Experiences onComputer Performance Optimization Shih-Hao Hung, Ph.D. Sun Microsystems Inc. Confidential Information – Do Not Forward

  2. Computer Performance • Demands for high performance: • Getting jobs done faster • Getting more jobs done at the same time • Getting complex jobs done in time • Price-performance trade-offs: • Getting jobs done efficiently • Getting jobs done with limited resources • Capacity planning Confidential - Do Not Forward

  3. Performance Optimization Confidential - Do Not Forward

  4. My Research Background • High-Performance/Technical ComputingParallel Performance Project University of Michigan, 1993-2000 • Parallelization for high-performance applications • Performance characterization tools • Performance optimization methodologies • Commercial Applications OptimizationPerformance and Availability EngineeringSun Microsystems, Inc., 2000-present • Database server optimization • Network stack optimization • Webserver optimization • Security Infrastructure optimization Confidential - Do Not Forward

  5. Parallel Performance Project • Started by Prof. Edward S. Davidson of U of Michigan in 1990, funded by NSF, Ford Motor Co., UM Center of Parallel Computing, IBM, DoD, etc. • Produced 11 Ph.D.’s in 10 years. • Work on state-of-the-art parallel supercomputers and realistic applications • Covers many aspects of computer architecture, from CPU pipelines to clustered systems. • Optimization by all means: instruction scheduling, memory locality, parallelization, etc. via compiler techniques and hand-tuning. Confidential - Do Not Forward

  6. Parallel Computing • Very hot in the 90’s: • People rushed to build large MPP’s. • Looks good in theory, but lack of practical tools and experiences. • Most existing apps are difficult to parallelize. • Failed to race with Moore’s Law. R&D cycle too expensive and too long to catch up with increase of CPU Mhz-Ghz. • Looking ahead: • Throughput computing and commercial workload drive MP. • Chip density and area favors SMT & CMP designs. • Struggling to find ways to keep the same growth of Ghz. • Multiple-core processors, multiple-processors systems are becoming the norm in the coming years. Confidential - Do Not Forward

  7. Optimizing Parallel Applications • Very complex, difficult problems: • Program parallelization • Load balance • Scheduling • Minimize interprocessor communications • Architecture-dependent optimization • Today: • Still lots of open problems. • Parallelizing compilers are far from automatic solutions. • Tomorrow: • Further research and practical solutions will be in high demand as MP systems becomes popular at all levels. Confidential - Do Not Forward

  8. Hierarchical Performance Bounds Confidential - Do Not Forward

  9. Example: FCRASH • Vehicle crash simulation at Ford. • Finite-element code contains over 10,000 Fortran lines and 14 parallel loops. • Profiled on a NUMA system (HP/Convex SPP-1000). • P-gap: imperfect parallelization • C’-gap: inter-cluster communications • L & M’-gaps: Load balancing issues Confidential - Do Not Forward

  10. Goal-directed Optimization Confidential - Do Not Forward

  11. Performance Tuning Confidential - Do Not Forward

  12. Modeling a Parallel Application Confidential - Do Not Forward

  13. Model-Driven Simulation Confidential - Do Not Forward

  14. Performance Tuning Results • SP - initial parallel version • SD - changing domain decomposition to reduce load imbalance (L-gap) and communications (C’-gap) • SD2 - SD + array padding to reduce false-sharing communications (Unmodeled-gap) • SD3 - SD2 + eliminating thread migration to reduce communications (Unmodeled-gap) • SD4 - SD3 + eliminating unnecessary synchronization barriers (S’-gap) Confidential - Do Not Forward

  15. Sun Microsystems • Proud of visions and innovative technologies. • Face fierce competitions in the server business • OS: Microsoft, Linux • CPU: Intel, IBM • High-end market: IBM, HP • Low-end market: Dell and other x86 vendors • Still going for the next big thing • Network computing (Java, JDS, JES, GridEngine) • Throughput computing (Niagara 1 & 2, Rock) • Solaris 10 & x86 support Confidential - Do Not Forward

  16. Performance Engineering • Performance problems everywhere… • Deal with important commercial applications: • Database • Network infrastructure & applications • Throughput computing • Security Infrastructure • Solve problems by: • Identifying issues • Improving products • Influencing future development Confidential - Do Not Forward

  17. Networking Infrastructure • Gigabit Ethernet driver optimization • TCP/IP stack optimization • Multi-data transmission and Jumbo Frames • TCP Offloading Engine (TOE) • Infiniband vs 10GE • On-chip high-speed Ethernet support Confidential - Do Not Forward

  18. Networking Applications • Optimizing SunOne servers • Webserver • Directory server • Application server • Portal server • Tweaking benchmarks • SPECweb99 & 2004 • SPECweb99_SSL • TPC-W (W = Web commerce) Confidential - Do Not Forward

  19. Security Infrastructure • Crypto accelerators • On-chip crypto support • Secure Socket Layer (SSL) & HTTPS acceleration • IPsec & VPN acceleration • Crypto optimization • Solaris Cryptographic Framework Confidential - Do Not Forward

  20. Crypto Acceleration Confidential - Do Not Forward

  21. http http tcp http sha1 http sha1 rc4 http tcp sha1 rc4 rsa http tcp sha1 rc4 rsa_reuse HTTP/SSL Performance • HTTP, 100% Keep Alive • HTTP, 0% Keep Alive • HTTPS, 100% Keep Alive, no encryption, SHA1 hashing • HTTPS, 100% Keep Alive, RC4 encryption, SHA1hashing • HTTPS, 0% Keep Alive, 100% session creation (RSA), RC4, SHA1 • HTTPS, 0% Keep Alive, 100% session resumption (RSA-reuse), RC4, SHA1 Confidential - Do Not Forward

  22. Confidential - Do Not Forward

  23. IPsec Performance Confidential - Do Not Forward

  24. Solaris Cryptographic Framework Confidential - Do Not Forward

  25. Throughput Computing - Niagara Confidential - Do Not Forward

  26. Niagara-2 4-Core Server Competition – Nov. 2007 Confidential - Do Not Forward

  27. Rock Confidential - Do Not Forward

  28. Conclusion • Will see radical changes in computer systems in the near future, and system-wide hardware-software co-optimization is key to unleash their potentials. • High density chips • Multi-core CPUs • Highly scalable systems • Enormous network & I/O capacity • Built-in security support • Performance is an expertise that is best acquired from experiences. • Methodology and collaboration are our formulas for success. Confidential - Do Not Forward

More Related