350 likes | 511 Views
Amazon CloudSearch Meetup August 15, 2012. Welcome. Housekeeping Slides will be posted Drawing. Agenda. Introduction to CloudSearch Jon Handler, CloudSearch Solutions Architect Relevance and Ranking Jack Conradson , Software Engineer Case Study: Reddit Keith Mitchell, Programmer
E N D
Welcome • Housekeeping • Slides will be posted • Drawing
Agenda • Introduction to CloudSearch • Jon Handler, CloudSearch Solutions Architect • Relevance and Ranking • Jack Conradson, Software Engineer • Case Study: Reddit • Keith Mitchell, Programmer • Q&A
Inverted Index US President
Search On The Web Relevance/Ranking Faceting Range Searching Fielded Searching Boolean Queries Complex Relevance
Search On The Web Relevance/Ranking Faceting Range Searching Fielded Searching Boolean Queries Complex relevance
Search On The Web Relevance/Ranking Faceting Range Searching Fielded Searching Boolean Queries Complex relevance
Search On The Web Relevance/Ranking Faceting Range Searching Fielded Searching Boolean Queries Complex relevance
Search On The Web Relevance/Ranking Faceting Range Searching Fielded Searching Boolean Queries Complex relevance
Search On The Web Relevance/Ranking Faceting Range Searching Fielded Searching Boolean Queries Complex relevance
Search On The Web Relevance/Ranking Faceting Range-Searching Fielded Searching Boolean Queries Complex Relevance
Amazon CloudSearch • Fully-managed, full-featured search service • Automatically scales for data & traffic • Handles both structured and unstructured data • Near real-time indexing • Up and running in less than 1 hour
Amazon CloudSearch Architecture SEARCH CLIENT SEARCH DEVELOPER www.example.com Send Search Requests Send Documents Create and Manage Domains Use the Search Tester Search Results SEARCH ENDPOINT DOCUMENT SERVICE ENDPOINT CONFIGURATION SERVICE ENDPOINT Configuration API Command Line Tools Document Service API Command Line Tools Console Search API Console Console SEARCH SERVICE DOCUMENT SERVICE CONFIGURATION SERVICE Add Documents Search Documents Create Domains Update Documents Configure Domains Delete Documents Delete Domains ACCESS CONTROL ACCESS CONTROL ACCESS CONTROL
Automatic Scaling: Data & Traffic DATA Document Quantity and Size SEARCH INSTANCE Index Partition 1 Copy 1 SEARCH INSTANCE SEARCH INSTANCE SEARCH INSTANCE Index Partition n Copy 1 Index Partition 2 Copy 1 Index Partition 1 Copy 1 TRAFFIC Search Request Volume and Complexity SEARCH INSTANCE SEARCH INSTANCE SEARCH INSTANCE Index Partition 2 Copy 2 Index Partition 1 Copy 2 Index Partition n Copy 2 SEARCH INSTANCE SEARCH INSTANCE SEARCH INSTANCE Index Partition n Copy n Index Partition 1 Copy n Index Partition 2 Copy n
Use Case • Million song dataset http://labrosa.ee.columbia.edu/millionsong/ • Search documents are songs • Attributes: title, artist names, years, genre, artist familiarity • We’ll use this to create a “Build Your Playlist” web application.
SDF Documents [ {"type":"add", "id": "sombzze12a8c134960", "version":5, "lang":"en", "fields": {"title":"Cajun Twisters", "artist_name":"Adam Ant", "year":"1993", "song_id":"sombzze12a8c134960", "artist_familiarity":449425, "genre":["alternative", "electronic", "instrumental", "rock"] } }, … ]
Configuration • cs-configure-from-sdf • Analyzes source files for fields and types. Heuristic • Individually
PHP Integration $results = file_get_contents( http://search-mn-songs-5bbplyghbb5tk257rsb7iamlsy." . "us-east-1.cloudsearch.amazonaws.com" . "/2011-02-01/search?q=" . $keyword . $bqParam . "&return-fields=title,artist_name,year,genre_result,artist_familiarity&". "facet=year_facet,genre&" . "facet-year_facet-sort=alpha&" . "facet-genre-sort=alpha&" . "facet-genre-top-n=100000&" . "facet-year_facet-top-n=100000&" . "t-year=1985..&" . "t-title=a..&" . "rank=-" . $rank); $resultsObj = json_decode($results);
Common Feature Requests • Field Weighted Relevance • Additional Regions and Languages • High Availability • Tighter integration with other AWS services (Dynamo/S3) • Support For Very Large Use Cases • Geo Sorting
Field Weights Use Case • Music Search • Dataset composed of the following fields: • Title • Album • Artist • Lyrics • Popularity • Results without field weights • May end up with results based heavily on lyrics when searching for an artist’s name (Guns & Roses vs. roses, guns) • Results with field weights • Possibly apply a greater weight to artist than lyrics
FWV in Rank Expressions • Rank expressions can be used within CloudSearch to customize relevance computations for better returned search results. • song_relevance = text_relevance + popularity • Natural to extend rank expressions to allow field-weighted values using JSON objects. • song_relevance = cs.text_relevance({weights: {artist=3.0, song=4.0}, default_weight=0.5} + 0.5*popularity
Query-Time Rank Expressions • Each set of defined rank expressions may take a while to be deployed to your search domain. • Query-time rank expressions would allow rank expressions to be defined during a query without having to wait • q=‘guns roses’&rank-qtre=cs.text_relevance({weights: {artist=3.0, song=4.0}, default_weight=0.5}&return-fields=qtre&rank=-qtre
Resources • Amazon CloudSearch Overview Page http://aws.amazon.com/cloudsearch/ • FAQs • Community Forum • Documentation & Getting Started Tutorial (IMDb) • Demos and Tutorials • What Is Amazon CloudSearch • Introducing Amazon CloudSearch (Features) • Building a Search Application Using Amazon CloudSearch • Getting Started Tutorial
Upcoming Events Las Vegas, November 27-29 • Enterprise Search Summit/KMworld, DC, Oct. 17-19 • Bay Area Amazon CloudSearch Group: Oct. 24