Using the Lucene Search Engine

Using the Lucene Search Engine

Team

Concepts

Lucene Full Text Search Cross Platform Lucene Document Inverted Index

Lucene

iViewXT

Search Improvements

Test Document Collections • UAT

Super Mario

Implementation Derek

Performance

Lucene Implementation

Lucene Implementation: Indexing

Lucene Indexing

Lucene Indexing Step 1 of 5

Lucene Indexing

Text Extraction • Lucene not a complete application. • PDF files text extraction • Microsoft files text extraction

Lucene Implementation

Searching:

Searching: Step 1 of 6

Searching: Step 4&5 of 6

Searching:

Luke - Lucene Index Toolbox • Client application to link directly into your index. • Java-webstart app • http://www.getopt.org/luke/ • Handy for testing searches and performance.

Some problems encountered • Max clause count exception: • Take care automatically adding wildcards!! • Performance: • Do the work while indexing, not while searching. • Pagination: Get one page at a time from the Hits. • Our security model • Stored collection of allowed containers in UserSession. • Visibility of indexing job. • Added logging “Indexing document 426 of 204,532”

http://lucene.apache.org/ http://www.ibm.com/developerworks/web/library/wa-lucene2/ http://www.ibm.com/developerworks/library/wa-lucene/ An open source document management system in php with a java lucene search engine Resources (general)‏ Handy ajax autocomplete component.

Resources (text extraction)‏ http://pdfbox.org Text extractor for pdf files JXL http://jexcelapi.sourceforge.net/ Text extractor for excel files. Text extractor for word documents. API to access Microsoft format files. (xls/doc/ppt). I would recommend this one over jxl or text-mining above.

Summary Lucene querying is fast (take care what you do with the results) Indexing is slow (Make indexing job visible) Use Luke Add lots to the index (Do the work while indexing)

END

Using the Lucene Search Engine

Using the Lucene Search Engine

Presentation Transcript

Lucene Near Realtime Search

Full-Text Search with Lucene

Engine to Luwak/Lucene

Search Engine

Search Engine

Search Engine

Search Engine

The Lucene Search Engine

SEARCH ENGINE

Search Engine using Web Mining

Search Engine

Search Engine

Search Engine

Full-Text Search with Lucene

Search engine

Lucene/SOLR 2: Lucene search API

Search Engine

search engine

SEARCH ENGINE

Full-Text Search with Lucene