150 likes | 297 Views
Apache Solr/Lucene: Looking Ahead. Topics. Me. You? Quick Overview of Lucen e and Solr Solr demo Where are we now? What’s in a version number? Looking Ahead Apache Lucene 3.1 and beyond Apache Solr 3.1 and beyond. Me. You? Lucene? Solr? New to Search? Other Search Engines?
E N D
Topics • Me. You? • Quick Overview of Lucene and Solr • Solr demo • Where are we now? • What’s in a version number? • Looking Ahead • Apache Lucene 3.1 and beyond • Apache Solr 3.1 and beyond
Me • You? • Lucene? • Solr? • New to Search? • Other Search Engines? • Crawling? • Database? • Scale?
Lucene is a mature, high performance Java API to provide search capabilities to applications • Supports indexing, searching and a number of other commonly used search features (highlighting, spell checking, etc.) • Not a crawler and doesn’t know anything about Adobe PDF, MS Word, etc. • Created in 1997 and now part of the Apache Software Foundation • Important to note that Lucene does not have distributed index (shard) support
Solr • Solr is the Lucene based search server providing the infrastructure required for most users to work with Lucene • Without knowing Java! • Also provides: • Easy setup and configuration • Faceting • Highlighting • Replication/Sharding • Lucene Best Practices http://search.lucidimagination.com
Quick Solr Demo • Pre-reqs: • Apache Ant 1.7.x • SVN • svn co https://svn.apache.org/repos/asf/lucene/dev/trunksolr-trunk • cdsolr-trunk/solr/ • ant example • cd example • java –jar start.jar • cdexampledocs; java –jar post.jar *.xml • http://localhost:8983/solr/browse
Where are we now? • Current releases • Apache Lucene 3.0.2 and 2.9.2 • Apache Solr 1.4.1 • Last March, the Lucene and Solr development communities merged to reduce duplication, ease development, etc. • Mail: dev@lucene.apache.org • User communities are still separate • java-user@lucene.apache.org, solr-user@lucene.apache.org
Where are we now? • Is the next release Solr 1.5 or 3.1? • Solr 3.1 (99% certain!) • Two main branches of development for both Lucene and Solr • Trunk (i.e 4.0) • https://svn.apache.org/repos/asf/lucene/dev/trunk/ • No guarantee of back compatibility (but best efforts are made) • 3.x Branch • https://svn.apache.org/repos/asf/lucene/dev/branches/branch_3x/ • Try to be backwards compatible to 1.4.X release • Most things are applied to both branches, but not all
Words to the Wise “Some or all of the following statements may contain projections or other forward-looking statements regarding future events or implementations in Lucene/Solr” “The statements are not meant to be inclusive of all changes”
Apache Lucene 3.1 and Beyond • https://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/CHANGES.txt • Performance Improvements in many areas • Bytes instead of Strings – Better Memory savings • Phrase scoring • Packed Ints • Analysis Contributions • Many new languages/dialects supported: Hindi, Indic, Arabic, Armenian, Persian, Indonesian, etc. on top of support for English, most European languages, Chinese, Japanese, Korean
Lucene 3.1 and Beyond • Expert Level • Flex APIs • Different codecs for the index • Total control over what is in the index • Pluggable scoring models • (Near) Real Time Search • Make newly indexed documents instantly available for search • See https://svn.apache.org/repos/asf/lucene/dev/branches/realtime_search/ • Much, much more
Apache Solr 3.1 and Beyond • http://svn.apache.org/repos/asf/lucene/dev/trunk/solr/CHANGES.txt • Solr Cloud • Make it easy to deploy and manage truly large scale search applications • 10B+ (100B? 1T?) docs with subsecond search/faceting • See http://wiki.apache.org/solr/SolrCloud • (Near) Real Time Search
Apache Solr 3.1 and Beyond • Spatial Search • “Find me all the Lulu authors that live within 50 miles of HQ” • Boost, sort, filter documents by distance and other spatial information • http://wiki.apache.org/solr/SpatialSearch http://www.openstreetmap.org/?lat=44.9744&lon=-93.2484&zoom=14&layers=B000FTFT
Solr 3.1 and Beyond • Group By/Field Collapsing • http://wiki.apache.org/solr/FieldCollapsing • Roll up results that have a common “token” • Examples: • All documents from the same URL • All documents by the same author that match • All documents in the same price range • Auto-suggest • Pivoted Faceting
Resources • http://lucene.apache.org • /solr • /java • http://www.lucidimagination.com • solr-user@lucene.apache.org • java-user@lucene.apache.org