1 / 12

Introduction to Apache Lucene/Solr

Introduction to Apache Lucene/Solr. CSCI 572: Information Retrieval and Search Engines Summer 2010. Outline. What is Lucene/Solr? Where did it come from? What are the current versions of Lucene/Solr? What can it do?. Apache Lucene. The brainchild of Doug Cutting

rnance
Download Presentation

Introduction to Apache Lucene/Solr

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to Apache Lucene/Solr CSCI 572: Information Retrieval and Search Engines Summer 2010

  2. Outline • What is Lucene/Solr? • Where did it come from? • What are the current versions of Lucene/Solr? • What can it do?

  3. Apache Lucene • The brainchild of DougCutting • Free-text indexing library that implements most of the functionality I’ve talked to you about • Query Models, Ranking, Indexing • Core API is implemented in Java • C++/C, Ruby, Python APIs as well, but small communities or automatically generated • Initially Sourceforge, moved to Apache in 2001

  4. Apache Solr • Originally developed at CNET • Web service layer built on topof Lucene library • Provides schema andunderstanding of field types, conversion to and from representation • Provides huge-scale scalability, deployed on top of application server like Tomcat or Jetty • P/L independent programming APIs • Sharing, replication, faceting, highlighting, explain, more like this and other functionality provided easily

  5. How to get started • Lucene (2.9.2 and 3.0.1 stable) • Put your Java hat on • Have Eclipse ready or your favorite IDE • Download lucene-core-<version>.jar from • http://repo1.maven.org/maven2/org/apache/lucene/ • Download src and build from • http://www.apache.org/dyn/closer.cgi/lucene/java/ • Check out some example Java code that demonstrates indexing and querying from Otis Gospodnetic • http://onjava.com/pub/a/onjava/2003/01/15/lucene.html

  6. How to get started • Solr • Grab a release of Solr (1.4.0 stable) • http://www.apache.org/dyn/closer.cgi/lucene/solr/ • Unpack into e.g., /usr/local/solr • Deploy onto tomcat • Install tomcat into /usr/local/tomcat • Create solr.xml file and drop into /usr/local/tomcat/conf/Catalina/localhost/ • Create solr.home JNDI property and point to /usr/local/solr/solr • Start tomcat • Head over to $solr/example/example-docs • curl http://localhost:8983/solr/update -H 'Content-type:text/xml; charset=utf-8' --data-binary @artists.xml

  7. Modifying your schema.xml • Field Types • Analyzers • Tokenizers http://wiki.apache.org/solr/SchemaXml

  8. Solr Faceting • facet=on&facet.field=&facet.field=… • http://wiki.apache.org/solr/SimpleFacetParameters

  9. Advanced Topics • Standing up cores • Sharding • Replication • Zookeeper and Cloud

  10. Development currently in flux • Stick with release versions • Depending on trunk won’t really help • Lucene and Solr have merged

  11. Wrapup • Lots more information at • http://lucene.apache.org • http://lucene.apache.org/solr/ • http://lucene.apache.org/java/ • Possible projects • Geospatial search • Improving existing code and contributing back to Apache SIS and to Apache Solr • Improving date faceting • Rewriting the ResponseWriter framework

  12. Acknowledgements • Material inspired by discussions and talks on the Apache Mailing lists for Solr, Lucene and through discussions with the rest of the Lucene community

More Related