1 / 23

Agenda

Performance Optimization in Apache 2.0 Development: How we made Apache faster, and what we learned from the experience Brian.Pane@cnet.com O’Reilly Open Source Convention, San Diego, CA July 24, 2002. Agenda. Introductions Performance optimization approach

morley
Download Presentation

Agenda

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Performance Optimization in Apache 2.0 Development:How we made Apache faster, and what we learned from the experienceBrian.Pane@cnet.comO’Reilly Open Source Convention, San Diego, CA July 24, 2002

  2. Agenda • Introductions • Performance optimization approach • Specific optimizations in Apache 2.0 • General strategy for open-source software performance improvement • Results and Next Steps

  3. Goals for Apache 2.0 Performance • Make the httpd faster • But what does that mean? • How will we measure speed? • What are we willing to sacrifice for speed? • And why does performance matter?

  4. Optimization Strategy: Part 1 Know your project’s priorities: • Metrics that matter • Rules of the game

  5. Performance Guidelines • Metrics that matter for Apache: • Throughput • HTTP requests per second • Resource utilization • CPU, memory • Rules of the game for Apache: • Keep the server portable, reliable, configurable, maintainable, and compatible

  6. Making Strategic Tradeoffs • Use these metrics and rules to make effective tradeoffs • Example: Table data structures • Slow, O(n)-time lookups; a significant bottleneck • But 3rd party code depended upon the array-based implementation (wasn’t well abstracted) • Solution: keep the O(n) design, but optimize it heavily (improve the throughput metric, but maintain compatibility)

  7. Optimization Strategy: Part 2 Profile early, profile often

  8. Profiling Tools • We used traditional code profiling tools to find the slow functions and basic blocks • gprof • Quantify • OProfile • Plus tracing tools to profile system calls • truss • strace • And occasional custom instrumentation

  9. Profile-Driven Optimization • Profiling helps to create an informal roadmap: • Small problems: fix the code now • Medium problems: phase in API changes & faster code • Large problems: rearchitect

  10. AcceptConnection ReadRequest Create RequestData Structures Map URLto File DetermineContent-Type LogRequest Send ResponseTo Client Stream OutputThrough Filters OpenFile Profile-Driven Optimization Apache 2.0 optimizations due to profiling, throughout the entire request processing flow: Faster accept(2)serialization Less buffercopying More scalable, multi-threaded memory allocator Less stringmanipulation Faster MIME-typemapper and configmerge Timestamp cachingin access logger Platform-specificsocket I/O speedups Complete rewrite ofserver-side-includeparser

  11. Optimization Strategy: Part 3 Take advantage ofimprovements in the platform

  12. Platform Optimizations • 2.0 uses fast platform features if available: • sendfile(2) • unserialized or pthread-mutex-serialized accept(2) • Atomic operations

  13. Platform Optimizations • Apache Portable Runtime (APR) library abstracts the OS specifics • “Greatest common denominator” approach • Write your application code to use efficient OS features • On platforms where those features are not available, APR will emulate them • In 2.0, the concurrency model is a plug-in • We can add better threading models for specific platforms

  14. Optimization Strategy, Part 4 Use the powerof distributed development

  15. Distributed Development • Just like open source debugging, open-source performance tuning scales well as more people work on a problem • “Redundant” coding has worked well: • Multiple people implementing different approaches to the same problem • Share ideas, compare results, pick the best implementation

  16. Distributed Optimization Example:SSI Parser From: Brian Pane Date: 2001-09-05 3:00:35Subject: remaining CPU bottlenecks in 2.0… Here are the top 30 functions, ranked according to their CPU utilization. : CPU time function (% of total) -------- ------------ find_start_sequence 23.9 … * find_start_sequence() is the main scanning function within mod_include. …

  17. Distributed Optimization Example:SSI Parser From: Justin Erenkrantz Date: 2001-09-05 8:42:46Subject: [PATCH] Potential replacement for find_start_sequence…Basically, replace the inner search with a Rabin-Karp search… From: Sander Striker Date: 2001-09-05 8:47:59Subject: Re: [PATCH] Potential replacement for find_start_sequence…Rabin-Karp introduces a lot of * and %. I'll try Boyer-Moore with precalced tables for '<!--#' and '--->'… From: Sascha Schumann Date: 2001-09-05 10:51:53Subject: Re: [PATCH] Potential replacement for find_start_sequence…I'd suggest looking at BNDM which combines the advantages of bit-parallelism (shift-and/-or algorithms) and suffix automata… From: Ian Holsman Date: 2001-09-05 16:18:11Subject: [PATCH] Potential replacement for find_start_sequence..--skip5 …I can post my code to the skip5 implementation. It isn't optimized yet, but in my tests I see a lower CPU utilization than the standard mod-includes parser…

  18. Distributed Optimization Example:SSI Parser From: Justin Erenkrantz Date: 2001-09-05 19:08:31Subject: [PATCH] Round 2 of mod_include/find_start_sequence...…Replaced Rabin-Karp with the bndm algorithm as implemented by Sascha. Seems to work. Can people please test/review?… • SSI parser performance improvement: • Before: 23.9% of total usr CPU time • After: 4.8% • Greater than 4x improvement in 48 hours

  19. Results

  20. Results Performance on a simple file delivery test: Test case description: • Server running on Solaris 8 on Sun E4000/8x167 MHz, 2GB RAM • 20 concurrent client connections requesting 10KB non-parsed file over 100Mb/s switched network

  21. Results Performance on a server-parsed (.shtml) file test: Test case description: • Server running on Solaris 8 on Sun E4000/8x167 MHz, 2GB RAM • 20 concurrent client connections over 100Mb/s switched network • .shtml file with virtual includes of five 2KB files

  22. Conclusion Next steps for Apache: • Continue incremental performance improvements • Explore highly scalable concurrency models (multiple connections per thread)

  23. Conclusion Recommendations for other projects: • Know your project’s priorities: • Metrics that matter • Rules of the game • Profile early, profile often • Take advantage of platform improvements • Use the power of distributed development

More Related