280 likes | 404 Views
My Research Experiences on Computer Performance Optimization. Shih-Hao Hung, Ph.D. Sun Microsystems Inc. Confidential Information – Do Not Forward. Computer Performance. Demands for high performance: Getting jobs done faster Getting more jobs done at the same time
E N D
My Research Experiences onComputer Performance Optimization Shih-Hao Hung, Ph.D. Sun Microsystems Inc. Confidential Information – Do Not Forward
Computer Performance • Demands for high performance: • Getting jobs done faster • Getting more jobs done at the same time • Getting complex jobs done in time • Price-performance trade-offs: • Getting jobs done efficiently • Getting jobs done with limited resources • Capacity planning Confidential - Do Not Forward
Performance Optimization Confidential - Do Not Forward
My Research Background • High-Performance/Technical ComputingParallel Performance Project University of Michigan, 1993-2000 • Parallelization for high-performance applications • Performance characterization tools • Performance optimization methodologies • Commercial Applications OptimizationPerformance and Availability EngineeringSun Microsystems, Inc., 2000-present • Database server optimization • Network stack optimization • Webserver optimization • Security Infrastructure optimization Confidential - Do Not Forward
Parallel Performance Project • Started by Prof. Edward S. Davidson of U of Michigan in 1990, funded by NSF, Ford Motor Co., UM Center of Parallel Computing, IBM, DoD, etc. • Produced 11 Ph.D.’s in 10 years. • Work on state-of-the-art parallel supercomputers and realistic applications • Covers many aspects of computer architecture, from CPU pipelines to clustered systems. • Optimization by all means: instruction scheduling, memory locality, parallelization, etc. via compiler techniques and hand-tuning. Confidential - Do Not Forward
Parallel Computing • Very hot in the 90’s: • People rushed to build large MPP’s. • Looks good in theory, but lack of practical tools and experiences. • Most existing apps are difficult to parallelize. • Failed to race with Moore’s Law. R&D cycle too expensive and too long to catch up with increase of CPU Mhz-Ghz. • Looking ahead: • Throughput computing and commercial workload drive MP. • Chip density and area favors SMT & CMP designs. • Struggling to find ways to keep the same growth of Ghz. • Multiple-core processors, multiple-processors systems are becoming the norm in the coming years. Confidential - Do Not Forward
Optimizing Parallel Applications • Very complex, difficult problems: • Program parallelization • Load balance • Scheduling • Minimize interprocessor communications • Architecture-dependent optimization • Today: • Still lots of open problems. • Parallelizing compilers are far from automatic solutions. • Tomorrow: • Further research and practical solutions will be in high demand as MP systems becomes popular at all levels. Confidential - Do Not Forward
Hierarchical Performance Bounds Confidential - Do Not Forward
Example: FCRASH • Vehicle crash simulation at Ford. • Finite-element code contains over 10,000 Fortran lines and 14 parallel loops. • Profiled on a NUMA system (HP/Convex SPP-1000). • P-gap: imperfect parallelization • C’-gap: inter-cluster communications • L & M’-gaps: Load balancing issues Confidential - Do Not Forward
Goal-directed Optimization Confidential - Do Not Forward
Performance Tuning Confidential - Do Not Forward
Modeling a Parallel Application Confidential - Do Not Forward
Model-Driven Simulation Confidential - Do Not Forward
Performance Tuning Results • SP - initial parallel version • SD - changing domain decomposition to reduce load imbalance (L-gap) and communications (C’-gap) • SD2 - SD + array padding to reduce false-sharing communications (Unmodeled-gap) • SD3 - SD2 + eliminating thread migration to reduce communications (Unmodeled-gap) • SD4 - SD3 + eliminating unnecessary synchronization barriers (S’-gap) Confidential - Do Not Forward
Sun Microsystems • Proud of visions and innovative technologies. • Face fierce competitions in the server business • OS: Microsoft, Linux • CPU: Intel, IBM • High-end market: IBM, HP • Low-end market: Dell and other x86 vendors • Still going for the next big thing • Network computing (Java, JDS, JES, GridEngine) • Throughput computing (Niagara 1 & 2, Rock) • Solaris 10 & x86 support Confidential - Do Not Forward
Performance Engineering • Performance problems everywhere… • Deal with important commercial applications: • Database • Network infrastructure & applications • Throughput computing • Security Infrastructure • Solve problems by: • Identifying issues • Improving products • Influencing future development Confidential - Do Not Forward
Networking Infrastructure • Gigabit Ethernet driver optimization • TCP/IP stack optimization • Multi-data transmission and Jumbo Frames • TCP Offloading Engine (TOE) • Infiniband vs 10GE • On-chip high-speed Ethernet support Confidential - Do Not Forward
Networking Applications • Optimizing SunOne servers • Webserver • Directory server • Application server • Portal server • Tweaking benchmarks • SPECweb99 & 2004 • SPECweb99_SSL • TPC-W (W = Web commerce) Confidential - Do Not Forward
Security Infrastructure • Crypto accelerators • On-chip crypto support • Secure Socket Layer (SSL) & HTTPS acceleration • IPsec & VPN acceleration • Crypto optimization • Solaris Cryptographic Framework Confidential - Do Not Forward
Crypto Acceleration Confidential - Do Not Forward
http http tcp http sha1 http sha1 rc4 http tcp sha1 rc4 rsa http tcp sha1 rc4 rsa_reuse HTTP/SSL Performance • HTTP, 100% Keep Alive • HTTP, 0% Keep Alive • HTTPS, 100% Keep Alive, no encryption, SHA1 hashing • HTTPS, 100% Keep Alive, RC4 encryption, SHA1hashing • HTTPS, 0% Keep Alive, 100% session creation (RSA), RC4, SHA1 • HTTPS, 0% Keep Alive, 100% session resumption (RSA-reuse), RC4, SHA1 Confidential - Do Not Forward
IPsec Performance Confidential - Do Not Forward
Solaris Cryptographic Framework Confidential - Do Not Forward
Throughput Computing - Niagara Confidential - Do Not Forward
Niagara-2 4-Core Server Competition – Nov. 2007 Confidential - Do Not Forward
Rock Confidential - Do Not Forward
Conclusion • Will see radical changes in computer systems in the near future, and system-wide hardware-software co-optimization is key to unleash their potentials. • High density chips • Multi-core CPUs • Highly scalable systems • Enormous network & I/O capacity • Built-in security support • Performance is an expertise that is best acquired from experiences. • Methodology and collaboration are our formulas for success. Confidential - Do Not Forward