Predictive Parallelization: Taming Tail Latencies in Web Search

Predictive Parallelization: Taming Tail Latencies in Web Search MyeongjaeJeon, Saehoon Kim, Seung-won Hwang, Yuxiong He, SamehElnikety, Alan L. Cox, Scott Rixner Microsoft Research, POSTECH, Rice University

Performance of Web Search 1) Query response time • Answer quickly to users (e.g., in 300 ms) 2) Response quality (relevance) • Provide highly relevant web pages • Improve with resources and time consumed Focus: Improving response timewithout compromising quality

Background: Query Processing Stages Focus: Stage 1 Query 100s – 1000s of good matching docs doc Doc. index search For example:300 ms latency SLA 10s of the best matching docs 2nd phase ranking Few sentences for each doc Snippet generator Response

Goal Query doc Doc. index search For example:300 ms latency SLA 2nd phase ranking Snippet generator Response Speeding up index search (stage 1) without compromising result quality • Improve user experience • Larger index serving • Sophisticated 2nd phase

How Index Search Works Query • Partition all web pages across index servers (massively parallel) • Distribute query processing (embarrassingly parallel) • Aggregate top-k relevant pages Pages Aggregator Top-k pages Top-k pages Top-k pages Top-k pages Top-k pages Top-k pages Index server Index server Index server Index server Index server Index server Problem: A slow server makes the entire cluster slow Partition Partition Partition Partition Partition Partition All web pages

Observation • Query processing on every server. Response time is determined by the slowest one. • We need to reduce its tail latencies Latency

Examples Fast response Slow response Aggregator Aggregator Index servers Index servers • Terminate long query in the middle of processing • → Fast response, but quality drop Long query (outlier)

Parallelism for Tail Reduction Opportunity Challenge Tails are few Tails are very long • Available idle cores • CPU-intensive workloads Latency distribution Latency breakdown for the 99%tile.

Predictive Parallelism for Tail Reduction • Short queries • Many • Almost no speedup • Long queries • Few • Good speedup

Predictive Parallelization Workflow Index server Execution time predictor query Predict (sequential) execution time of the query with high accuracy

Predictive Parallelization Workflow Index server Execution time predictor query Resource manager long short • Using predicted time, selectively parallelize long queries

Predictive Parallelization • Focus of Today’s Talk • Predictor: of long query through machine learning • Parallelization: of long query with high efficiency

Brief Overview of Predictor In our workload, 4% queries with > 80 ms At least 3% must be identified (75% recall) Prediction overhead of 0.75ms or less and high precision Existing approaches: Lower accuracy and higher cost

Accuracy: Predicting Early Termination Lowest Docs sorted by static rank Highest • Only some limited portion contributes to top-k relevant results • Such portion depends on keyword (or score distribution more exactly) Doc 1 Doc 2 Doc 3 ……. Doc N-2 Doc N-1 Doc N Web documents Inverted index for “SIGIR” ……. ……. Processing Not evaluated

Space of Features • Term Features [Macdonald et al., SIGIR 12] • IDF, NumPostings • Score (Arithmetic, Geometric, Harmonic means, max, var, gradient) • Query features • NumTerms (before and after rewriting) • Relaxed • Language

New Features: Query • Rich clues from queries in modern search engines <Fields related to query execution plan> rank=BM25F enablefresh=1 partialmatch=1 language=en location=us …. <Fields related to search keywords> SIGIR (Queensland or QLD)

Space of Features • Term Features [Macdonald et al., SIGIR 12] • IDF, NumPostings • Score (Arithmetic, Geometric, Harmonic means, max, var, gradient) • Query features • NumTerms (before and after rewriting) • Relaxed • Language

Space of Features • All features cached to ensure responsiveness (avoiding disk access) • Term features require 4.47GB memory footprint (for 100M terms)

Feature Analysis and Selection • Accuracy gain from boosted regression tree, suggesting cheaper subset

Prediction Performance • Query features are important • Using cheap features is advantageous • IDF from keyword features + query features • Much smaller overhead (90+% less) • Similarly high accuracy as using all features A = actual long queries P = predicted long queries

Algorithms • Classification vs. Regression • Comparable accuracy • Flexibility • Algorithms • Linear regression • Gaussian process regression • Boosted regression tree

Accuracy of Algorithms • Summary • 80% long queries (> 80 ms) identified • 0.6% short queries mispredicted • 0.55 ms for prediction time with low memory overhead

Predictive Parallelism • Key idea • Parallelize only long queries • Use a threshold on predicted execution time • Evaluation • Compare Predictive to other baselines • Sequential • Fixed • Adaptive

99%tile Response Time 50% throughput increase • Outperforms “Parallelize all”

Related Work • Search query parallelism • Fixed parallelization [Frachtenberg, WWWJ 09] • Adaptive parallelization using system load only [Raman et al., PLDI 11]  High overhead due to parallelizing all queries • Execution time prediction • Keyword-specific features only [Macdonald et al., SIGIR 12] → Lower accuracy and high memory overhead for our target problem

Thank You! Your query to Bing is now parallelized if predicted as long. Execution time predictor query Resource manager long short

Predictive Parallelization: Taming Tail Latencies in Web Search

Predictive Parallelization: Taming Tail Latencies in Web Search

Presentation Transcript

Neverending Search:

Lecture 4

Expanding Square Search Pattern

Efficient Runahead Execution Processors A Power-Efficient Processing Paradigm for Tolerating Long Main Memory Latencies

Search Engine Technology

Mid-term Review Chapters 2-7

Search Engine

Parallel Computing with MATLAB

On the Road to Genomic Predictive Medicine An Interim Analysis

Introduction to Biomedical Informatics Data Mining: Predictive Modeling

Compton scattering

Taming the Wild Beast: Indexing with the AIRS/LA County Taxonomy of Human Services

Chapter Overview Search

The Future of Search

Search Engine Optimization (SEO)

Taming the Content Beast: Content Strategy and Modeling for IT Professionals

159.741 STATE-SPACE SEARCH

Risk Factors are not predictive factors due to protective factors Carl C. Bell, MD

Enhancing Technology-Mediated Communication Tools, Analyses, and Predictive Models

Female U rinary Incontinence; Tips for Taming the Tinkles

Chapter Overview Search