1 / 27

Online Expansion of Rare Queries for Sponsored Search

This paper presents a novel method for determining and displaying relevant ads on a search engine results page in real-time, improving ad relevance and user engagement. By expanding queries and utilizing a scoring system, the paper enhances the efficiency of ad placements. The research addresses the challenge of tail queries and proposes a unique architecture for query feature extraction and ad feature weighting. Results indicate significant improvements in ad relevance.

ryan-torres
Download Presentation

Online Expansion of Rare Queries for Sponsored Search

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Online Expansion of Rare Queries for Sponsored Search Defended by Mykell Miller

  2. Summary: The Short Version This paper describes and evaluates a method of determining which ads to display on a search engine result page. Users input varied queries, so it is beneficial to post ads pertaining to not only the query, but to related queries as well. However, previous methods of finding these related queries and transforming them into ads takes a long time, and therefore are done offline. This paper describes a method that allows some of the work to be done on the fly without too much overhead.

  3. Why it’s good: The Short Version • Useful • Ads fund search engines • If ads were more relevant, Jared might actually click on them • The method shows statistically significant improvement in making ads more relevant, at a low overhead • Interesting • Interestingness is subjective, but this is MY defense • Well-written • Well-organized • I could actually understand the math because they very clearly told me what all the variables meant • They defined all the relevant terms and summarized all the references so I didn’t have to read 32 other papers. • Time Travel • This paper is only three weeks old • A paper that was published in April cited it

  4. Now for the long version…

  5. Broad matching is where an ad is displayed when its bid phrase is similar to, but not exactly, the query the user inputted. What this paper is about

  6. What this paper is about • Sponsored Search • A.K.A. Paid search advertising • On Search Engine Result Pages • All major web search engines do this • Context Match • A.K.A. Contextual Advertising • On other websites • What we looked at last Wednesday

  7. More on Sponsored Search • The authors assume a pay-per-click model • Google, Yahoo, and Microsoft all use this model • Bid Phrases • This is the query that will result in showing this ad. • Bidding system • An advertiser pays the search company whatever it wants to associate its ad with a bid phrase • If an advertiser pays more, its ad gets a higher ranking. • Example: • High Bidders pays $1,000,000,000,000,000,000,000 for the bid phrase “Dummy Query” • Low Bidders pays $1 for the bid phrase “Dummy Query” • When I search for “Dummy Query” I see High Bidders’ ad first, then Low Bidders’ ad.

  8. More on Sponsored Search

  9. Why Do This Paper? • 30-40% of search engine result pages have no ads on them because Google, Yahoo, etc. don’t know what queries are similar to the bid phrase • Previous work has developed systems that are far too inefficient to use in real life

  10. Query: Banana Bread Query: Nut-Free Banana Bread My Own Experiment • Query: Vegan Banana Bread

  11. Why do tail queries have so few ads? • They are often harder to interpret than more common (head and torso) queries • There are rarely exact matches for bid queries • There is little historical click data • Search engines don’t like posting irrelevant ads

  12. What does this paper accomplish? • Online query expansion for tail queries • New way to index query expansions for fast computation of query similarity • A way to go from pre-expanded queries to expanding related queries on the fly • A ranking and scoring method

  13. The Architecture of their system

  14. Query Feature Extraction • Unigrams • Process them via • Stemming • Taking words like “Extraction” and “Extracting” and stemming them to “Extract” • Stop words • Ignoring words you don’t like • Phrases • Multi-word phrases are from a dictionary of ~10 million phrases gathered from query logs and web pages • Semantic Classes • Developed a hierarchical taxonomy of 6000 semantic classes • Annotate each query with the 5 most likely semantic classes

  15. Related Query Retrieval • Now we have a pseudo-query made up of features. • Compare this pseudo-query to our inverted index and pull out related pseudo-queries • Runs a system that pulls out key words then calculates the similarity using a dot product

  16. Query Expansion • Q* is the set of features describing the original features and related queries • The weight of a given feature in Q* is a linear combination of its weight in the original and related queries • This expansion is efficient because you’re only looking at the features in related queries

  17. Ad Feature Weighting • Extract the same features from the bid phrases of ad groups as from queries (unigrams, phrases, semantic classes) • Since the weighting from the queries would unfairly benefit short ad groups, use the BM25 weighting scheme.

  18. Title Match Boosting • Increases the score of ads whose titles match the original query very well

  19. Scoring Function • The end result of all this • A weighted sum of dot products between features and the title match boost

  20. Now on to the results!

  21. Test Set • Test set: 400 random rare queries from Yahoo • 121 were in the lookup table, 279 were not • Eliminated the 10% of rare queries that were foreign • Human editors judged the top 3 ads. • 3556 judgments • The system was built off of every ad Yahoo has and 100 million queries based off of U.S. Yahoo

  22. Metrics • Discounted Cumulative Gain (DCG) • “a measure of effectiveness of a Web search engine algorithm or related applications, often used in information retrieval. Using a graded relevance scale of documents in a search engine result set, DCG measures the usefulness, or gain, of a document based on its position in the result list. The gain is accumulated cumulatively from the top of the result list to the bottom with the gain of each result discounted at lower ranks.” –Wikipedia • DCG is a number; higher numbers are better • Precision-Recall Curves • Precision: Fraction of results returned that are relevant • Recall: Fraction of relevant results that are returned • A way to visualize it; higher is better

  23. Ad Matching Algorithms Tested • Baseline • The original, unexpanded version of the query vector • Offline Expansion • Expands the original query by pre-processing offline only • Online Expansion • Expands the original query by processing online only • Online + Offline Expansion • Expands the original query using both offline and online expansion algorithms

  24. Test Results: Queries not found in lookup table • Tested the baseline vs online expansion • The online expansion gave statistically significant improvements

  25. Test Results: Queries found in lookup table • Tested all 4 algorithms • Best: offline expansion • Second best: online + offline expansion • Difference between the two was not statistically significant

  26. Test results: full set • Tested on all four algorithms • Best: online + offline expansion • Online expansion also offers statistically significant improvement • Even better: hybrid

  27. Efficiency • The table lookup takes only 1 ms • Least efficient when a query is not in the lookup table • When a query is not in the lookup table, there is a 50% overhead • This is bad • But given the small proportion of queries not in the lookup table, the estimated average is 12.5% overhead • This is good

More Related