1 / 29

Topic: Lucene

Learn about Lucene, a high-performance, open-source search software library that revolutionized Information Retrieval. From its inception in late 1997 to being widely adopted by web search engines, discover how Lucene enhances data retrieval. This article covers its history, features, capabilities, and API, empowering you to harness its potential for efficient text indexing and searching across various document types.

trunyon
Download Presentation

Topic: Lucene

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Topic: Lucene Presenter Ken Hoang

  2. History • Doug Cutting • Lucene (Doug’s wife middle name) • Search software • Start writing in late 1997 using Java • 2000 – SourceForge (open source) • 2001 - Apache adopted Lucene • 2004 - widely used by web search engines

  3. Why Lucene was written. • The explosion of the Internet • Too much information in storage • Inefficient if crawl hundreds of thousands web pages.

  4. Solution • Lucene was created - more dynamic ways of finding information - high performance - scalable Information Retrieval(IR) library - open-source project implement in Java - under Apache Jakarta License

  5. What Lucene is • A software library • Not full-featured search application • Full–featured search applications built on top of Lucene

  6. What Lucene can do • Allows to add indexing and searching capabilities to applications • Text indexing • Make searchable data converted to a textual format • Fast random access to words

  7. What Lucene can do (cont’) • Search words in - Web pages on remote servers - Documents stored in local files systems - Simple text files - MS Word documents - HTML or PDF files

  8. Lucene API • A handful of classes • Indexing - IndexWriter - Directory - Analyzer - Document - Field

  9. Lucene API (cont’) • Analysis - converting field text into terms (index representation) • http://lucene.apache.org/java/docs/api/overview-summary.html • http://lucene.apache.org/java/docs/

  10. Application Web Manual Input User DB Present Search Results Get Users’ Query File System Gather Data Lucene Index Documents Search Index Index

  11. Application • Autofocus

  12. Question?

More Related