210 likes | 227 Views
Learn about Lucene for efficient full-text search of your dataset, integrating with JPA/Hibernate, 'LIKE' queries, and embedding search libraries. Includes a review of search options, limitations, and best practices.
E N D
Word Up! Using Lucene for full-text search of your data set
Full-text search • Review of full-text search options • Focus on Lucene • Integrating Lucene with JPA/Hibernate
Full-text search options • ‘LIKE’ queries • SQL extensions • Kludge with web search engine • Kludge with web search appliance • Embeddable search library
‘LIKE’ queries • Simple, straightforward • Fast, easy to implement • Large result set • Limited fuzziness (wildcard or regex)
Full-text search extensions • No standard syntax (Sybase, MSSQL, DB2, etc. all different) • Administrative overhead for text search indices • Other limitations
Kludge with search engine • External indexing/search software • ht://Dig • mnoGoSearch • Sphinx • Xapian • Not necessarily pure Java • Can be database-intensive • Lag in updating search index
Kludge with search appliance • “Black-box” solutions • Thunderstone • Google Search Appliance • Your data set mixes with public content • Doesn’t always work as advertised • Can’t fine-tune search
Search library • Example: Apache Lucene • Deploys as part of your application • 100% Java • Fuzzy full-text search (Levenshteinalgorithm) • Searches against text, numeric, booleanfields with multiple options • Can be integrated with JPA/Hibernate via Hibernate Search, Compass
About Lucene • Search index stored on file system (also JDBC and BDB options) • Can store/retrieve data to/from search index (Lucene Projections) • Can index HTML, XML, Office docs, PDFs, Exchange mail with external tools • Supports extended and multi-byte character sets by default
More about Lucene • Indexes records as Lucene Document object • Lucene Document doesn’t have to be a literal document – can be any arbitrary object • Document can have any number of name-value pairs • Synchronizing your data with search index is someone else’s problem …
Integrating with JPA / Hibernate • Most common method: Hibernate Search • Supports only Hibernate provider • Automatically updates search index when object persisted to database • Entity classes mapped to separate indexes • Entity fields mapped to Lucene index fields using Java annotations
Integrating with JPA/Hibernate … • Alternate method: Compass Project • Supports Hibernate, OpenJPA, others • No release since 2009 – effectively unsupported
Annotated class example … @Indexed @Entity @Cacheable(true) @Table(name="MARKER", schema="MAPLINK") public class Marker extends MarkerA implements Serializable { @Id @Column(name="MKR_MARKERID") @Field(store=Store.YES) private long mkrMarkerid; @Column(name="MKR_LAT", nullable = true) @Field(store=Store.YES) @NumericField private Double mkrLat; @Column(name="MKR_LONG", nullable = true) @Field(store=Store.YES) @NumericField private Double mkrLong; @Indexed – tells Hibernate that this entity class should be indexed
Annotated class example … @Indexed @Entity @Cacheable(true) @Table(name="MARKER", schema="MAPLINK") public class Marker extends MarkerA implements Serializable { @Id @Column(name="MKR_MARKERID") @Field(store=Store.YES) private long mkrMarkerid; @Column(name="MKR_LAT", nullable = true) @Field(store=Store.YES) @NumericField private Double mkrLat; @Column(name="MKR_LONG", nullable = true) @Field(store=Store.YES) @NumericField private Double mkrLong; @Field – tells Hibernate to create a matching name-value pair in the search index for this entity class Store.YES – stores the value for retrieval directly from the index, without touching the database
Annotated class example … @Indexed @Entity @Cacheable(true) @Table(name="MARKER", schema="MAPLINK") public class Marker extends MarkerA implements Serializable { @Id @Column(name="MKR_MARKERID") @Field(store=Store.YES) private long mkrMarkerid; @Column(name="MKR_LAT", nullable = true) @Field(store=Store.YES) @NumericField private Double mkrLat; @Column(name="MKR_LONG", nullable = true) @Field(store=Store.YES) @NumericField private Double mkrLong; @NumericField – index as a numeric value, enables greater than / less than / range searches