1 / 30

Reducing Latency of Web Search Queries

Reducing Latency of Web Search Queries. Myeongjae Jeon Computer Science Comp 600, 2013 fall. Performance of Web Search. 1) Query response time Answer quickly to users Reduce both mean and high-percentile latency 2) Response quality (relevance) Provide highly relevant web pages

clay
Download Presentation

Reducing Latency of Web Search Queries

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Reducing Latency of Web Search Queries Myeongjae Jeon Computer Science Comp 600, 2013 fall

  2. Performance of Web Search 1) Query response time • Answer quickly to users • Reduce both mean and high-percentile latency 2) Response quality (relevance) • Provide highly relevant web pages • Improve with resources and time consumed • Goal: reduce response time at no quality drop

  3. How Search Engine Works Query Pages • All web pages partitioned across index servers • Query processing distributed (embarrassingly parallel) • Top-k relevant pages aggregated Aggregator Top-k pages Top-k pages Top-k pages Top-k pages Top-k pages Top-k pages Index Server Index Server Index Server Index Server Index Server Index Server Partition Partition Partition Partition Partition Partition All web pages

  4. Issues to Address in My Work Fast response Slow response, or Lower-quality response Aggregator Aggregator Index Servers Index Servers • Dealing with outlier • Wait: slow response • Drop: quality loss, resource waste • Speedup: fast response with no quality loss • Dealing with workload variability • Query execution time, degree of speedup, system load Outlier

  5. Summary of Contributions • Query parallelism • Run a query with multiple threads to reduce its response time • Adaptive parallelism • Select degree of parallelism per query at runtime • Execution time prediction • Identify long-running queries to reduce latency efficiently • Current work • Combine parallelism and prediction effectively in the server • Reduce query latency at the cluster level

  6. Query Processing and Early Termination Lowest Docs sorted by static rank Highest • Processing “Not evaluated” part is useless. • Unlikely to contribute to top-k relevant results Doc 1 Doc 2 Doc 3 ……. Doc N-2 Doc N-1 Doc N Web documents “Myeongjae Jeon” Inverted index ……. ……. Processing Not evaluated

  7. Assigning Data to Threads • Purpose: data processing similar to sequential execution Sequential execution Sorted web documents Highest rank Lowest rank T1 T2 T1 T2 T2 T1 T2 T1 T2 T2 7 1 6 3 8 2 4 5 • Key approach: dynamic alternating on small chunks

  8. Share Thread Info Using Global Heap • Share information to reduce wasted execution • Use a global heap and update asynchronously • Reduce sync overhead by batched updates Global heap (global top-k) Sync Sync Thread 1 (local top-k) Thread 2 (local top-k)

  9. Outline • Query parallelism • Run a query with multiple threads to reduce its response time • Adaptive parallelism • Select degree of parallelism per query at runtime • Execution time prediction • Identify long-running queries to reduce latency efficiently • Current work • Combine parallelism and prediction effectively in the server • Reduce query latency at the cluster level

  10. No Load Heavy Load Query 1 Query 4 Query 5 Query 6 Query 1 Query 2 Query 3 • Parallelize the query • Execute queries sequentially

  11. No Speedup Linear Speedup T1 T1 T6 Query Query • Use the minimum degree • Use the maximum degree

  12. Speedup in Reality • Mostly neither no speedup nor linear speedup

  13. Adaptive Algorithm • Decide parallelism degree at runtime • Pick a degree (p) that minimizes response time min(Tp +  K) Tp: execution time with parallelism degree p K: system load (# waiting queries) N: number of cores p My execution time Latency impact on waiting queries

  14. Experimental Setup • Machine setup • Two 6-core Xeon processors (2.27GHz) • 32GB memory • 22GB dedicated to caching • 90GB web index in SSD • Workload • 100K Bing user queries • Experimental system • Index server • Client • Replay obtained queries • Poisson distribution • Varying arrival rate (query per second) • Query termination • Early termination or hitting end of doc

  15. Mean Response Time- Fixed Parallelism - • No fixed degree of parallelism performs well for all loads.

  16. Mean Response Time- Adaptive Parallelism - 47% • Lower than all other fixed degrees • Much lower than sequential execution

  17. Mean Response Time- Adaptive Parallelism - • Select any degree among all possible options • Parallelism degrees are utilized unevenly to produce the best performance

  18. 95th-Percentile Response Time • Similar improvements in 99% response time 52%

  19. Outline • Query parallelism • Run a query with multiple threads to reduce its response time • Adaptive parallelism • Select degree of parallelism per query at runtime • Execution time prediction • Identify long-running queries to reduce latency efficiently • Current work • Combine parallelism and prediction effectively in the server • Reduce query latency at the cluster level

  20. Speeding up Long-Running Queries is Important for Consistently Low Latency • Varying execution times • Many short-running queries • Few long-running queries • High tail latency • 99% execution time: • 15 times of the average • 56 times of the median

  21. Parallelization is Effective Only forLong-Running Queries • Short-running queries • Large overhead • Almost no speedup • Long-running queries • High efficiency • Good speedup

  22. Designing Predictor • Key considerations for predictor • Prediction accuracy and implementation cost • Our predictor: cheap features • Document frequency from keyword features • Query features

  23. Prediction Accuracy • Query features are important • Using cheap features is advantageous • Much smaller overhead • Almost the same accuracy as using all features A = true long- running queries P = predicted long- running queries

  24. Outline • Query parallelism • Run a query with multiple threads to reduce its response time • Adaptive parallelism • Select degree of parallelism per query at runtime • Execution time prediction • Identify long-running queries to reduce latency efficiently • Current work • Combine parallelism and prediction effectively in the server • Reduce query latency at the cluster level

  25. Per-Server Optimizations:Localized, Uncoordinated Decisions • Mechanisms to combine • System load • Query execution time • Speedup of parallelism Aggregator Index Servers Prediction & Parallel. Prediction & Parallel. Prediction & Parallel. Prediction & Parallel. Prediction & Parallel.

  26. Policies Composed • For different objectives: • Different set of mechanisms can be selected. • The same set of mechanisms can be combined in different ways.

  27. Per-Server Optimizations:Potential Sources of Outlier • The outlier • Misprediction • Low degree due to load spike • Need coordination • Reactively by communication • Proactively by aggregator Aggregator Index Servers

  28. Approaches Considered • Adaptive parallelism under execution time prediction with reactive coordination • Timeline approaches coordinating servers directed by the aggregator

  29. Tying All Together Query Fast response Aggregator Index Servers Coordination Prediction & Parallel. Prediction & Parallel. Prediction & Parallel. Prediction & Parallel. Prediction & Parallel.

  30. Conclusion • Effectively parallelize web search queries • Design and implement building blocks • Study how to compose them • Reduce latency both in a server and across cluster • Proposed techniques make real impacts

More Related