150 likes | 232 Views
Why tune relevance. Because we want to find the one single best item, among a large group of possible candidates…. 1. Multiple levels of control. Relevancy Ranking Precision Recall. Application Model. Business Rules. Levels of control. InPerspective ™. Core Algorithmic Model.
E N D
Why tune relevance Because we want to find the one single best item, among a large group of possible candidates…. 1
Multiple levels of control Relevancy Ranking Precision Recall Application Model Business Rules Levels of control InPerspective™ Core Algorithmic Model
FAST Relevancy Framework Multiple levels of control Accessible to… Control Mechanisms Application Model End Users Sorting order, navigation, relevance feedback Business Rules Business Managers Query and document “boosting” (BMCP) Levels of control InPerspective™ Administrator “Rank Profile” Developer Algorithm “weights” Core Algorithmic Model
FAST Relevancy FrameworkInPerspective™ • Freshness • How fresh is the document compared to the time of the query? • Completeness • How well does the query match superior contexts like the title or the url? • Example: query=”Mexico”, Is ”Mexico” or ”University of New Mexico” best? • Authority • Is the document considered an authority for this query? • Examples: Web link cardinality, article references, product revenue, page impressions, ... • Statistics • How well does the contents of this document on overall match the query? • Examples: Proximity, context weights, tf-idf, degree of linguistic normalization,++ • Quality • What is the quality of the document? • Examples: Homepage?, Press release?, ... • Distance • What is the distance from where I am? InPerspective
FAST Relevancy Framework : Rank Profile • Rank-Profile: Default (Intranet) • Authority: • Freshness: • Proximity: • Context: • Body: • Description • URL: • Keywords: • Title: • Rank-Profile: Financial News • Authority: • Freshness: • Proximity: • Context: • Body: • Description • URL: • Keywords: • Title: • Rank-Profile:Wealth Management • Authority: • Freshness: • Proximity: • Context: • Body: • Description • URL: • Keywords: • Title:
FAST UnityTM What It Does, How It Works, and What Value It Provides
FAST ESP Federation FAST Unity at a Glance FAST Sources • FAST ESP 5.x • FAST Data Search 4.x • FAST ImPulse • FAST AdMomentum • FAST RetrievalWare Front-end Search Application • External Sources • Microsoft SharePoint 2003 & 2007 • Web search engines • Google, Yahoo, OpenSearch, Gigablast • Web services • Match.com, PriceGrabber, Google Image • Advertising services • Google Adsense Search Index Web Search Engine Web Site … … Internal Sources External Sources (e.g. another ESP instance)
Look and feel - Unity Calls-to- Action Featured Content Ads Multimedia User-generated Content SubscriptionFeeds Third-partyContent
ExampleWeb 2.0 Model • One query - multiple result sets • Results are returned asynchronously • Delivered directly to the browser
FAST ESP - Scalability BUSINESS APPLICATIONS BUSINESS MANAGERS END - USERS SITE SEARCH eCOMMERCE ANALYTICS SCALABILITY COMPLIANCE ACCURACY INTRANET AVAILABILITY SECURITY FRAUD DETECTION eDIRECTORIES MARKET INTELLIGENCE FLEXIBILITY DEVELOPERS SURVEILLANCE IT MANAGERS 3D Scalability: #Documents - #Users - Index Latency Single Search Node Performance • 20-50 Million documentsUp to 1TB of information • 100-500 queries per second • 20-50 ms query response time • Down to 50 ms indexing latency • Indexing 50+ documents per second while maintaining search performance • FAST Scalability Facts: • Deployments with >40TB • Deployments with >3B documents • Deployments with 1 to 1000+ servers • Deployments with 1000s of queries per second • Deployments with >500 updates per second • 20-50 ms query response time • Sub-second indexing latency • Crawling >200 documents per second per server Dual Pentium4, 3 Ghz 4 GB Ram3 X SCSI 15K rpmHW RAID-0 derivate Document Freshness SCALING
Query Performance of FAST Search VS RDBMSProven High QPS, Low Latency Access – Database Offloading QPS • Structured data: • 5 million records; • 13 fields per record • Structured queries: • 22 SQL queries( Representative in ERP ) ESP5 ESP5 • #1: FAST ESP4 w/ disk • Mean = 99 [ms] • St.dev. = 36 [ms] • #2: Oracle w/ memory mapping • Mean = 4 057 [ms] • St.dev. = 9 368 [ms] RDBMS RDBMS Identical HW : single node, 2 CPU, 4GB ram 3 SCSI disks Identical data : auction data from eBay, 3.6 million doc’s Identical queries: 200 queries defined by Oracle Latency
SEARCH ESP5 ScalabilityEfficiency Per Server & Linear Scaling INDEX SEARCH Query ... QUERY PROCESSING CONTENTREFINEMENT ... ... Pluggable Content Dispatcher Query & Result Distribution ... ... ... ... ... RESULT PROCESSING Documents
ESP5 – Raising the BarEnabling the Adaptive Information Warehouse SCALABLE HIGH PERFORMING • Linear scaling of feeding capacity • Archival solutions @ 40 PB • 14G Search solution (14X google) • Feed @ >6000 updates/s • Querying @ >2000 QPS • 100M documents per server • >2 X indexing throughput • Consistent low latency • Reduced disk footprint • Feeding architecture improved • Simplified state management • Improved fault-tolerance • Out-of-the-box monitoring • End2End SOA philosophy • Studio&Programmatic extensibility • Semantic index • SAN/NAS optimizations RELIABLE FLEXIBLE
FAST ESP Competence Analysis . Performance & Scalability with commodity servers . 70+ multi-language support . Easy to use management tool and security control . Relevancy/Precision find what users want . Navigation to quickly to find what users want within few clicks . Add-on applications including Recommendation, Advertising promotion, Mobile access, DB cleansing/offloading, … . 200+ connectors to connect market popular silos . Extensibility and Integration with open architecture . Market leading #1 . Large R&D investment and commitment