430 likes | 552 Views
Search Science @ eBay. … from a Web Search perspective Tim Converse Senior Director Head of Search Science Engineering. Outline. The eBay Search problem (starting from Web search) What’s the same? What’s different? Search Science @ eBay W ho we are W hat we do Current frontier.
E N D
Search Science @ eBay … from a Web Search perspective Tim Converse Senior Director Head of Search Science Engineering
Outline • The eBay Search problem (starting from Web search) • What’s the same? What’s different? • Search Science @ eBay • Who we are • What we do • Current frontier
My background • Inktomi => Yahoo! Web Search • Spam detection / doc classification • Powerset => Bing Web Search (Microsoft) • ML Ranking and metrics for NLP-driven engine • Led Search Summaries group for Bing • Jybe => Yahoo! • Co-founded small mobile personalization co. • Personalization of main Yahoo! page • Joined eBay November 2013
The eBay Search Results Page (SRP) Auction listing
The eBay Search Results Page (SRP) Fixed-price listing
The eBay Search Results Page (SRP) Category refinements
The eBay Search Results Page (SRP) Aspect refinements
The eBay Search Results Page (SRP) Related Searches
The eBay Search Results Page (SRP) Sort type:
Search Intents eBay Search Web Search (Broder’02) • Informational • Info in document or summary • Navigational • Get me to a known site or page • Transactional • I want to buy something • Transactional • Most common: want to buy something specific • Less common: browsing/window-shopping
eBay Scale! [Numbers from 2012 / 2013] • ~130 million buyers and sellers • 250 million queries/day • ~1 billion listings • 100 petabytes of data (Hadoop / Terabyte) • ~2 billion pageviews / day • $75 billion of merchandise sold (2012)(mostly as a result of searches)
Search ScienceMission • Mission : Help the user find the item they want as quickly and easily as possible • Users typically care about three things in search results • Relevance • Trust • Value • Success Measures : • Revenue per user (primary) • Human-judged relevance tests • Search is a product of Search Front End, Search Back End, and Search Science
Recall • Responsible for mapping a query to a set of items to return • Primarily accomplished by query rewriting • User query transformed into much larger back-end query • Terms mapped to: synonyms, plural/singular, categories, aspects • e.g. [red dress] => [category = dresses, color = red] • Phrases enforced ([iphone 5]) • Whole-query expansions for popular queries • Process driven by data-mining on queries/clicks/purchases
Ranking • Given a query and a recall set, rank the set • Largely machine-learned (with business rules on top) • Separate machine-learned models for auction and fixed-price • Auction / fixed-price interleaving handled separately • A number of targets for machine learning • Clicks • Revenue per query/item (Why this is a good idea) • Features: • Query/item match features, clicks & sales, seller metrics ….
Refinements and Universal Search • Responsible for most non-item SRP elements: • Aspect refinements • Category refinements • Related searches • Inline elements (universal search) • Snippets • Mostly driven by offline analysis of user behavior
Refinements and Universal Search Category refinements
Refinements and Universal Search Aspect refinements
Refinements and Universal Search Related Searches
Metrics, Tools, Monitoring • Human judgments for training sets, internal metrics • Tools for scraping, studying queries and result sets • Intelligent alerting on relevance and system problems
Spam • Effort broader than Search Science • Alternatives: • Educate (explain to sellers why they shouldn’t game) • Block (don’t allow sellers to add spammy listings in first place) • Neutralize (algorithmically remove advantage of abuse) • Enforce (remove bad listings and sellers) • Search Science helps neutralize and flag for enforcement
Details of our anti-spam algorithms [This page intentionally left blank]
Query segmentation by frequency Unique queries Result set size Immediate Filtering Immediate Conversion HeadMillions Head Head Thousands Head Torso Tens of thousands Torso (1/3 Head) Torso Hundreds of thousands Torso (3x Head) Tail Tens of millions Tail (5x Head) Tail Hundreds Tail (1/5 Head)
Query understanding (annotation) • Query: [white gold hoop earrings] • Stage 1 (bag of words): hoop, gold, earrings, white • Stage 2 (phrasing and rewriting): • “white gold” hoop earrings, or • color=gold, color=white, hoop, earrings category=jewelry • Stage 3: • ProductType = earrings, style=hoop, material=[white gold], color=white, category=jewelry
Deterministic sorts • For sorting, we offer • Best Match (relevance sort) • Time: ending soonest • Time: newly listed • Price + Shipping: lowest first • Price + Shipping: highest first • Distance: nearest first • Deterministic (non-best-match) sorts can surface very irrelevant items
Explicit result set construction for diversity • “3rd-phase ranking” (after recall and ranking) • Examine result set, enforce diverse mixes of products/interpretations
Personalization and contextualization • Personalization • What does this person like to {click, buy} • Product types, price ranges, condition, shipping • Session contextualization • What [product type, price range, condition, etc] item did this person {click, buy} in the immediately-preceding query? • Contextualization > personalization
Image understanding for ranking Search Results Query “red shoes”
Technologies we like and use • Hadoop ecosystem • Scala (Scoobi, Scalding) • R (gbm)