1 / 21

Mastering Full-Text Search with Lucene

Learn about Lucene for efficient full-text search of your dataset, integrating with JPA/Hibernate, 'LIKE' queries, and embedding search libraries. Includes a review of search options, limitations, and best practices.

bhollinger
Download Presentation

Mastering Full-Text Search with Lucene

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Word Up! Using Lucene for full-text search of your data set

  2. Full-text search • Review of full-text search options • Focus on Lucene • Integrating Lucene with JPA/Hibernate

  3. Full-text search options • ‘LIKE’ queries • SQL extensions • Kludge with web search engine • Kludge with web search appliance • Embeddable search library

  4. ‘LIKE’ queries

  5. ‘LIKE’ queries • Simple, straightforward • Fast, easy to implement • Large result set • Limited fuzziness (wildcard or regex)

  6. Full-text search extensions • No standard syntax (Sybase, MSSQL, DB2, etc. all different) • Administrative overhead for text search indices • Other limitations

  7. Kludge with search engine • External indexing/search software • ht://Dig • mnoGoSearch • Sphinx • Xapian • Not necessarily pure Java • Can be database-intensive • Lag in updating search index

  8. Kludge with search appliance • “Black-box” solutions • Thunderstone • Google Search Appliance • Your data set mixes with public content • Doesn’t always work as advertised • Can’t fine-tune search

  9. Embeddable search library

  10. Search library • Example: Apache Lucene • Deploys as part of your application • 100% Java • Fuzzy full-text search (Levenshteinalgorithm) • Searches against text, numeric, booleanfields with multiple options • Can be integrated with JPA/Hibernate via Hibernate Search, Compass

  11. About Lucene • Search index stored on file system (also JDBC and BDB options) • Can store/retrieve data to/from search index (Lucene Projections) • Can index HTML, XML, Office docs, PDFs, Exchange mail with external tools • Supports extended and multi-byte character sets by default

  12. More about Lucene • Indexes records as Lucene Document object • Lucene Document doesn’t have to be a literal document – can be any arbitrary object • Document can have any number of name-value pairs • Synchronizing your data with search index is someone else’s problem …

  13. Integrating with JPA / Hibernate • Most common method: Hibernate Search • Supports only Hibernate provider • Automatically updates search index when object persisted to database • Entity classes mapped to separate indexes • Entity fields mapped to Lucene index fields using Java annotations

  14. Integrating with JPA/Hibernate … • Alternate method: Compass Project • Supports Hibernate, OpenJPA, others • No release since 2009 – effectively unsupported

  15. Annotated class example … @Indexed @Entity @Cacheable(true) @Table(name="MARKER", schema="MAPLINK") public class Marker extends MarkerA implements Serializable { @Id @Column(name="MKR_MARKERID") @Field(store=Store.YES) private long mkrMarkerid; @Column(name="MKR_LAT", nullable = true) @Field(store=Store.YES) @NumericField private Double mkrLat; @Column(name="MKR_LONG", nullable = true) @Field(store=Store.YES) @NumericField private Double mkrLong; @Indexed – tells Hibernate that this entity class should be indexed

  16. Annotated class example … @Indexed @Entity @Cacheable(true) @Table(name="MARKER", schema="MAPLINK") public class Marker extends MarkerA implements Serializable { @Id @Column(name="MKR_MARKERID") @Field(store=Store.YES) private long mkrMarkerid; @Column(name="MKR_LAT", nullable = true) @Field(store=Store.YES) @NumericField private Double mkrLat; @Column(name="MKR_LONG", nullable = true) @Field(store=Store.YES) @NumericField private Double mkrLong; @Field – tells Hibernate to create a matching name-value pair in the search index for this entity class Store.YES – stores the value for retrieval directly from the index, without touching the database

  17. Annotated class example … @Indexed @Entity @Cacheable(true) @Table(name="MARKER", schema="MAPLINK") public class Marker extends MarkerA implements Serializable { @Id @Column(name="MKR_MARKERID") @Field(store=Store.YES) private long mkrMarkerid; @Column(name="MKR_LAT", nullable = true) @Field(store=Store.YES) @NumericField private Double mkrLat; @Column(name="MKR_LONG", nullable = true) @Field(store=Store.YES) @NumericField private Double mkrLong; @NumericField – index as a numeric value, enables greater than / less than / range searches

  18. Let’s take a Luke at the index …

  19. Practical search exercise

  20. Questions!

More Related