110 likes | 583 Views
Lucene/Solr Architecture. Request Handlers. Response Writers. Update Handlers. /admin. /select. /spell. XML. Binary. JSON. XML. CSV. binary. Extracting Request Handler (PDF/WORD). Search Components. Schema. Update Processors. Query. Highlighting. Signature. Spelling.
E N D
Lucene/Solr Architecture Request Handlers Response Writers Update Handlers /admin /select /spell XML Binary JSON XML CSV binary Extracting Request Handler (PDF/WORD) Search Components Schema Update Processors Query Highlighting Signature Spelling Statistics Logging Faceting Debug Indexing Apache Tika More like this Clustering Query Parsing Config Distributed Search Data Import Handler (SQL/RSS) Analysis Faceting Filtering Search Caching High-lighting Index Replication Apache Lucene Core Search IndexReader/Searcher Indexing IndexWriter Text Analysis
Lucene/Solr plugins • RequestHandlers – handle a request at a URL like /select • SearchComponents – part of a SearchHandler, a componentized request handler • Includes, Query, Facet, Highlight, Debug, Stats • Distributed Search capable • UpdateHandlers – handle an indexing request • Update Processor Chains – per-handler componentized chain that handle updates • Query Parser plugins • Mix and match query types in a single request • Function plugins for Function Query • Text Analysis plugins: Analyzers, Tokenizers, TokenFilters • ResponseWriters serialize & stream response to client
Lucene/Solr Query Plugin Architecture Declarative Analysis per-field - Tokenizer to split text - TokenFilter to transform tokens - Analyzer for completely custom - Separate query / index analyzer QParser plugins - Support different query syntaxes - Support different query execution - Function Query supports pluggable custom functions - Excellent support for nesting/mixing different query types in the same request. schema.xml // declaratively defines types // and analyzers for fields <fieldType name=“text1”> <filter=“whitespace”> <filter=“customFilter” …> <filter=“synonyms” file=..> <filter=“porter” except=..> <field name=“title” type=“text1” <field name=“cust1” class=… solrconfig.xml Analyzer for “title” Whitespace Tokenizer Analyzer for “cust1” (potentially completely custom architecture not using tokenizer/filters) CustomFilter SynonymFilter Porter Stemmer < index configuration /> < caching configuration /> < request handler config /> < search component config /> < update processor config /> < misc – HTTP cache, JMX > <parser name=“mycustom” … <func name=“custom” class=… MyCustom QParser Lucene QParser Function Range Q XML QParser DisMax QParser Function QParser sum max pow log sqrt custom
Lucene/Solr Request Plugins {“response”={ “docs”={ http://.../select?q=cheese&wt=json /select /admin/luke /mypath RequestHandler Request Handler (non-component based) Request Handler (custom) XML response writer Query Component Facet Component XSLT response writer Highlight Component Binary response writer Distributed Search Debug Component JSON response writer Query Response Custom response writer Additional plug-n-play search components TermVector QueryElevation Spellcheck Terms MoreLikeThis Statistics My Custom Clustering
Lucene/Solr Indexing PDF <doc> <title> HTTP POST Remove Duplicates processor HTTP POST /update /update/csv /update/xml /update/extract XML Update Handler CSV Update Handler XML Update with custom processor chain Extracting RequestHandler (PDF, Word, …) Custom Transform processor Logging processor Update Processor Chain (per handler) Text Index Analyzers Data Import Handler Database pull RSS pull Simple transforms RSS feed pull Lucene Index processor SQL DB pull Lucene Index