120 likes | 372 Views
Introduction to Apache Lucene/Solr. CSCI 572: Information Retrieval and Search Engines Summer 2010. Outline. What is Lucene/Solr? Where did it come from? What are the current versions of Lucene/Solr? What can it do?. Apache Lucene. The brainchild of Doug Cutting
E N D
Introduction to Apache Lucene/Solr CSCI 572: Information Retrieval and Search Engines Summer 2010
Outline • What is Lucene/Solr? • Where did it come from? • What are the current versions of Lucene/Solr? • What can it do?
Apache Lucene • The brainchild of DougCutting • Free-text indexing library that implements most of the functionality I’ve talked to you about • Query Models, Ranking, Indexing • Core API is implemented in Java • C++/C, Ruby, Python APIs as well, but small communities or automatically generated • Initially Sourceforge, moved to Apache in 2001
Apache Solr • Originally developed at CNET • Web service layer built on topof Lucene library • Provides schema andunderstanding of field types, conversion to and from representation • Provides huge-scale scalability, deployed on top of application server like Tomcat or Jetty • P/L independent programming APIs • Sharing, replication, faceting, highlighting, explain, more like this and other functionality provided easily
How to get started • Lucene (2.9.2 and 3.0.1 stable) • Put your Java hat on • Have Eclipse ready or your favorite IDE • Download lucene-core-<version>.jar from • http://repo1.maven.org/maven2/org/apache/lucene/ • Download src and build from • http://www.apache.org/dyn/closer.cgi/lucene/java/ • Check out some example Java code that demonstrates indexing and querying from Otis Gospodnetic • http://onjava.com/pub/a/onjava/2003/01/15/lucene.html
How to get started • Solr • Grab a release of Solr (1.4.0 stable) • http://www.apache.org/dyn/closer.cgi/lucene/solr/ • Unpack into e.g., /usr/local/solr • Deploy onto tomcat • Install tomcat into /usr/local/tomcat • Create solr.xml file and drop into /usr/local/tomcat/conf/Catalina/localhost/ • Create solr.home JNDI property and point to /usr/local/solr/solr • Start tomcat • Head over to $solr/example/example-docs • curl http://localhost:8983/solr/update -H 'Content-type:text/xml; charset=utf-8' --data-binary @artists.xml
Modifying your schema.xml • Field Types • Analyzers • Tokenizers http://wiki.apache.org/solr/SchemaXml
Solr Faceting • facet=on&facet.field=&facet.field=… • http://wiki.apache.org/solr/SimpleFacetParameters
Advanced Topics • Standing up cores • Sharding • Replication • Zookeeper and Cloud
Development currently in flux • Stick with release versions • Depending on trunk won’t really help • Lucene and Solr have merged
Wrapup • Lots more information at • http://lucene.apache.org • http://lucene.apache.org/solr/ • http://lucene.apache.org/java/ • Possible projects • Geospatial search • Improving existing code and contributing back to Apache SIS and to Apache Solr • Improving date faceting • Rewriting the ResponseWriter framework
Acknowledgements • Material inspired by discussions and talks on the Apache Mailing lists for Solr, Lucene and through discussions with the rest of the Lucene community