370 likes | 530 Views
Building Intelligent Search Applications with Apache Solr and PHP5. Israel Ekpo Software Architect with Bonnier Corporation Author of Apache Solr PECL extension Website: http://www.israelekpo.com Email: iekpo@php.net Twitter: @israelekpo. About the Presenter. Why Search?.
E N D
Building Intelligent Search Applications with Apache Solr and PHP5
Israel Ekpo Software Architect with Bonnier Corporation Author of Apache Solr PECL extension Website: http://www.israelekpo.com Email: iekpo@php.net Twitter: @israelekpo About the Presenter
Why Search? • Looking for needle in haystack. • Retrieve information quickly. • Retrieve information and relevant results. • Narrow down result sets. • Sell more products to customers. • Display content of interest to the visitors. • Keep users staying on the web applications. • Increase the number of returning users.
How to Implement Search • MySQL (full text search with MyISAM) • Sphinx • Lucene • Apache Solr
Apache Solr Search Tool of Choice?
Apache Solr Features • HTTP interface for clients (any language) • Standalone powerful full-text search server • REST-like HTTP/XML and JSON APIs • Hit highlighting • Faceted search • Dynamic clustering • Database integration • Open Source (FREE)
http://www.apache.org/dyn/closer.cgi/lucene/solr/ Current Version is 1.4.0 Where do I get Solr?
How do I Install It or Set it Up? • Solr can run as a Standalone Search Server • However for production purposes it is recommended to set it up with a servelet container or application server such as Jetty, Tomcat or Glassfish.
Verifying Availability of Java 1.6 Dependencies $ dpkg --get-selections | grep sun-java sun-java6-bin install sun-java6-jdk install sun-java6-jre install $ sudo aptitude install sun-java6-jdk sun-java6-bin sun-java6-jre Setting Up Tomcat to Work with Solr
Getting Tomcat 6 • http://tomcat.apache.org/download-60.cgi • Get the URL to the Core Binary distribution from your closest mirror and then • $ wget http://apache.cs.utah.edu/tomcat/tomcat-6/v6.0.20/bin/apache-tomcat-6.0.20.tar.gz • $ tar -zxvf apache-tomcat-6.0.20.tar.gz
Setting up Tomcat 6 • $ sudo mv apache-tomcat-6.0.20 /usr/local/tomcat • The $JAVA_HOME and $JAVA_OPTS variables are required so we have to declare these in the ~/.bashrc file. • The $JAVA_HOME variable will be set to /usr/lib/jvm/java-6-sun • The Solr home will be set to /usr/local/tomcat/solr
Contents of .bashrc file • export JAVA_HOME=/usr/lib/jvm/java-6-sun • export JAVA_OPTS="$JAVA_OPTS -Dsolr.solr.home=/usr/local/tomcat/solr -Dsolr.data.dir=/usr/local/tomcat/solr/data -Dsolr.abortOnConfigurationError=true" • For multi-core configuration, please remove -Dsolr.data.dir=/usr/local/tomcat/solr/data from the options
Tomcat Manual Setup Complete • The set up for Tomcat is now complete. • It can be started and stopped using the following commands : • $ sudo /usr/local/tomcat/bin/startup.sh • $ sudo /usr/local/tomcat/bin/shutdown.sh
Setting up Tomcat Admin and Users • This will be done in the $CATALINA_HOME/conf/tomcat-users.xml file • $ sudo vim /usr/local/tomcat/conf/tomcat-users.xml • <tomcat-users> • <role rolename="manager"/> • <role rolename="admin"/> • <role rolename="webuser"/> • <user username="admin" password="Ch8ng3me" roles="admin,manager,webuser"/> • <user username="frontend" password="Ch8ng3me" roles="webuser"/> • </tomcat-users>
Port number for HTTP Connector • The default port for the Java HTTP Connector is 8080 • This should be the first /Server/Service/Connector element node. • The second /Server/Connector node is for the Java AJP Connector. • $ sudo vim /usr/local/tomcat/conf/server.xml • <Server ...> • <Service ...> • <Connector port="8983" ... /> • ... • </Connector> • </Service> • </Server>
Character Encoding for Non-ASCII • Find the node /Server/Service/Connector element node and add or set its • "URIEncoding" attribute to "UTF-8". • $ sudo vim /usr/local/tomcat/conf/server.xml • <Server ...> • <Service ...> • <Connector ... URIEncoding="UTF-8"/> • ... • </Connector> • </Service> • </Server>
Automatic Startups and Shutdowns • Create the file /etc/init.d/tomcat and enter startup and shutdown commands : • $ sudo vim /etc/init.d/tomcat
case $1 in start) sh /usr/local/tomcat/bin/startup.sh ;; stop) sh /usr/local/tomcat/bin/shutdown.sh ;; restart) sh /usr/local/tomcat/bin/shutdown.sh sh /usr/local/tomcat/bin/startup.sh ;; esac exit 0 Automatic Shutdown and Startups
We have to make the /etc/init.d/tomcat script executable : $ sudo chmod 0755 /etc/init.d/tomcat The final step is to create a symbolic link between the /etc/init.d/tomcat script to the startup and shutdown folders. $ sudo ln -s /etc/init.d/tomcat /etc/rc1.d/K99tomcat $ sudo ln -s /etc/init.d/tomcat /etc/rc2.d/S99tomcat Terminamos! Automatic Shutdown and Startups
1. Download Solr 1.4.0 $ wget http://mirror.csclub.uwaterloo.ca/apache/lucene/solr/1.4.0/apache-solr-1.4.0.zip 2. Unzip the compressed folder $ unzip apache-solr-1.4.0.zip 3. Copy the solr.war file to the Tomcat webapps directory : $ sudo cp -p apache-solr-1.4.0/example/webapps/solr.war /usr/local/tomcat/webapps/solr.war Setting up Apache Solr
4. We now have to set up the Solr Home. Copy the example solr home example/solr as a template for your solr home. $ sudo cp -pr apache-solr-1.4.0/example/solr /usr/local/tomcat/solr From here on $SOLR_HOME is /usr/local/tomcat/solr The default solrconfig.xml file in $SOLR_HOME/conf/solrconfig.xml set the data directory for the index as ./solr/data relative to the current working directory. Please modify this to the absolute path to $SOLR_HOME/data $sudo vim /usr/local/tomcat/solr/solrconfig.xml <dataDir>${solr.data.dir:/usr/local/tomcat/solr/data}</dataDir> Setting Up Solr
5. We are almost done. We now have to restart the servlet container $ sudo /etc/init.d/tomcat restart Setting Up Solr
6. If you are setting up Solr in multi-core mode, then you need to set up the solr.xml in the $SOLR_HOME folder The core name, instance and data directory will be specified in the solr.xml file, each one with its own schema.xml and solrconfig.xml files This can be accomplished by simply moving the original conf folder to the instance directory for each core and then changing the contents of the files to match your settings Setting Up Solr
<?xml version='1.0' encoding='UTF-8'?> <solr persistent="true" sharedLib="lib"> <cores adminPath="/admin/cores" shareSchema="false"> <core name="confooevents" instanceDir="confooevents"> <property name="dataDir" value="/usr/local/tomcat/solr/data/confooevents" /> </core> <core name="confoospeakers" instanceDir="confoospeakers" > <property name="dataDir" value="/usr/local/tomcat/solr/data/confoospeakers" /> </core> <core name="confoosuggest" instanceDir="confoosuggest" > <property name="dataDir" value="/usr/local/tomcat/solr/data/confoosuggest" /> </core> </cores> </solr> Running Solr in Multi-Core Mode
<fields> <field name="speaker_id" type="tint" indexed="true" stored="true" required="true" /> <field name="speaker_name" type="string" indexed="true" stored="true" multiValued="false" omitNorms="true"/> <field name="company" type="string" indexed="true" stored="true" multiValued="false" omitNorms="true"/> <field name="talks" type="string" indexed="true" stored="true" multiValued="true" omitNorms="true"/> <field name="number_talks" type="tint" indexed="true" stored="true" multiValued="false" omitNorms="true"/> <field name="bio" type="text" indexed="true" stored="true" multiValued="false"/> <field name="text" type="text" indexed="true" stored="false" multiValued="true"/> <!-- Copying from display to default search fields. see text above and defaultSearchField below --> <copyField source="speaker_name" dest="text" /> <copyField source="company" dest="text" /> <copyField source="talks" dest="text" /> <copyField source="bio" dest="text" /> </fields> <!-- This is the primary key for this index --> <uniqueKey>speaker_id</uniqueKey> schema.xml
<dataDir>${solr.data.dir:/usr/local/tomcat/solr/confoospeakers/data}</dataDir> solrconfig.xml
Apache Solr PECL Extension Interacting Solr using PHP
How to Get the Solr PECL extension • pecl install solr-beta • http://pecl.php.net/package/solr • Extract tarball • Enter extension directory • phpize • ./configure • make and make install • adjust php.ini settings • run php -me
$options = array ( 'hostname' => SOLR_SERVER_HOSTNAME, 'port' => SOLR_SERVER_PORT, 'path' => SOLR_PATH_SPEAKERS, 'timeout' => SOLR_SERVER_TIMEOUT, ); /* Creating SolrClient instance */ $client = new SolrClient($options); Adding Documents to Solr
/* Creating new input document */ $doc = new SolrInputDocument(); $doc->addField('speaker_id', $speaker_id); $doc->addField('speaker_name', $speaker_name); $doc->addField('company', $company); /* Adding document to the index */ $client->addDocument($doc); /* Finalizing Changes Do not forget this step */ $client->commit(); Adding Documents to Solr
/* Creating SolrClient instance */ $client = new SolrClient($options); /* Remove the target document if you know the UniqueKey */ $client->deleteById($speaker_id); /* Finalizing Changes. Do not forget this step */ $client->commit(); Removing Documents from Solr
SolrQuery Syntax http://wiki.apache.org/solr/SolrQuerySyntax http://lucene.apache.org/java/2_9_1/queryparsersyntax.html q=PHP q=company:Microsoft q=number_talks:[1 TO 3] q="apache solr"~10 Searching for Data in the Index
/* Creating SolrClient instance */ $client = new SolrClient($options); $query = new SolrQuery(); $query->setQuery($search_string); $query_response = $client->query($query); $response = $query_response->getResponse(); Searching for Data
/* Creating SolrClient instance */ $client = new SolrClient($options); $query = new SolrQuery(); $query->setQuery($search_string); $query->setHighlight(true); $query->setHighlightUsePhraseHighlighter(true); $query->setHighlightMaxAnalyzedChars(10000); $query->setHighlightFragsize(5000); $query->addHighlightField('speaker_name_t'); $query->addHighlightField('company_t'); $query->addHighlightField('bio'); $query->setHighlightSimplePre('<strong>'); $query->setHighlightSimplePost('</strong>'); $query_response = $client->query($query); $response = $query_response->getResponse(); Highlighting Hits
/* Creating SolrClient instance */ $client = new SolrClient($options); $query = new SolrQuery(); $query->setQuery($search_string); $query->setFacet(true); $query->setFacetMinCount(1); $query->addFacetField('company'); $query->addFacetField('number_talks'); $query_response = $client->query($query); $response = $query_response->getResponse(); Dynamic Faceting of Results
/* Creating SolrClient instance */ $client = new SolrClient($options); $query = new SolrQuery(); $auto_complete_type = 'terms'; if ('terms' == $auto_complete_type) { $query->setTerms(true); if (strlen($auto_suggest_string)) { $query->setTermsPrefix($auto_suggest_string); } $query->setTermsField('speaker_name'); } else { $query->addField('speaker_name'); $query->setQuery("{!prefix f=speaker_name}$auto_suggest_string"); } AutoSuggest when Searching
http://joind.in/1398 http://www.israelekpo.com/works Downloads and Feedback