90 likes | 103 Views
This article explains the architecture of Searchy, a federated search engine that incorporates agents for various data sources like LDAP, SQL, the Google API, and Searchy itself. It provides information on how to install and configure a Searchy agent and offers a guide on using Searchy for federated data access. The article also includes details on the TF-emc2Wiki, a protected resource with full and read-only access.
E N D
Federated search engine and the (PAPIzed) TF-emc2Wiki
The Searchy Architecture • Each source incorporates an agent, available through a SOAP interface • Uses RDF as internal representation • Agents for LDAP, SQL, the Google API, and Searchy itself
Searchy test installation • To evaluate federated data acces using Searchy • Build a directory of middleware resources • Using each organization's data sources • Installing a Searchy agent in your systems • Initially, RedIRIS runs the main search interface http://www.rediris.es/busquedas/searchy/middleware/index.en.phtml • Prepare a report with your feedback as a deliverable
Installing your Searchy agent • Download and unpack the lattest Searchy distribution • http://jsearchy.sourceforge.net/ • You only need J2SE >= 1.4 • Select your data sources (backends) • SQL • LDAP • Web servers (Google API for a restricted search) • Configure your agent • Use the sample agent configuration file in the conf directory • Or the simplified configuration to be distributed in the list • Support at http://lists.sourceforge.net/lists/listinfo/jsearchy-users • Register your agent • Host and port • searchy-emc2@rediris.es
Configuring your Searchy agent • Searchy configuration is contained in a XML file • conf/agent.xml • Three main elements • <transport> • General parameters of the agent • <provider> • Access parameters to the different data sources • More than one provider can be used for an agent • <map> • Take care of the data transformations • Queries received by the agent into queries to the provider(s) • Responses from the providers into metadata to be sent by the agent
The <transport> element • Basic configuration parameters • Identifier for the agent • Providers to be used • Port to listen at and maximum number of connections • Log configuration (using log4j) • Vocabulary to be used by the metadata • A subset of Dublin Core is going to be used: • dc:title, dc:subject and dc:description for queries • dc:title, dc:subject, dc:description, dc:creator (and URL!) for responses • ACLs to be applied when receiving • Simple rules based on hostname or IP addresses • Pilot config only accepts connections from certain RedIRIS hosts
The <provider> element • Identifier, type and applicable map • The rest of parameters depend on the type • Three types included in the pilot config • Google • The account key to be used when connecting to the WS interface • SQL • A valid JDBC driver class name • Connection data: URL using the jdbc method, hostname, port, database, username, password • LDAP • URL for the LDAP server • Root and search scope • Other LDAP parameters: follow referrals, timeout,...
The <map> element • Map name and applicable vocabulary • Elements describing input/outpust transformations • <URL>: Do not fiddle with it unless you know what you're doing! • One element per input term (type="query") • How query term is translated into the backend query language <dc:title filter="query">SELECT titleDB, subjectDB, creatorDB, descriptionDB FROM table WHERE (titleDB="%query%")</dc:title> • One element per output term (type="response") • How results field (enclosed between %) are transformed to build the term contents in the response <dc:description type="response">%snippet%</dc:description>
The (PAPIzed) TF-emc2Wiki • Available athttp://www.rediris.es/wiki/tf-emc2/ • Protected by PAPI • Possibility of full and read-only access • We'll be happy to make interoperability tests with other AAIs • We'll include all the users in the mailing list • Username: your e-mail address • Password: you'll receive one that you can (should) change • Those already with access to the JRA5Wiki will be automatically enabled