150 likes | 260 Views
WP13.4: ELIXIR EB-eye Feasibility Study Silvano Squizzato External Services Team (EB-eye). ELIXIR Work package Meeting May 20th 2008. Challenges in searching biological data. Diversity of the data sets (format, size, content…).
E N D
WP13.4: ELIXIR EB-eye Feasibility Study Silvano Squizzato External Services Team (EB-eye) ELIXIR Work package MeetingMay 20th 2008
Challenges in searching biological data • Diversity of the data sets (format, size, content…). • Many data providers already have their own search mechanism in place. • Heterogeneity of the search results (display, granularity…). • Navigation between different resources (cross-references) not always consistent. ELIXIR WP13.4 EB-eye Feasibility Study
EB-eye - The search engine at EBI A fast, efficient, scalable search engine:www.ebi.ac.uk/ebisearch A single access point to all the main resources hosted at the EBI. Based on Apache Lucene technology. Exposes both a web and a web services interface. Displays search results as Google-like lists of entries. Acts as gateway to more than 40 distinct datasets (260 million entries). Presents results that are up-to-date with the data resources. Allows users to navigate the network of cross-references. Searches most of the EBI resources in one go. 3 ELIXIR WP13.4 EB-eye Feasibility Study
EB-eye - Summary overview ELIXIR WP13.4 EB-eye Feasibility Study
EB-eye - Data view ELIXIR WP13.4 EB-eye Feasibility Study
EB-eye - Web services http://www.ebi.ac.uk/Tools/webservices/services/eb-eye ELIXIR WP13.4 EB-eye Feasibility Study
EB-eye – Web services clients ELIXIR WP13.4 EB-eye Feasibility Study
WP13.4 ELIXIR EB-eye Feasibility Study • Investigate the adoption of the EB-eye technology to search third party data resources. • Identify the viable methods of using the EB-eye engine in different contexts. • Integrate new data repositories in the EB-eye with designated data provider partners. • Verify the coherence of search results coming from diverse sources. ELIXIR WP13.4 EB-eye Feasibility Study
Partners involved • MEROPS (Sanger, UK) • http://merops.sanger.ac.uk • An information resource for peptidases and the proteins that inhibit them. • Neil D. Rawlings at the Sanger Institute. • GPCRDB (Vriend – Neijmegen, NL) • http://www.gpcr.org/7tm • Information System for G Protein-Coupled Receptors. • G.Vriend, B. Vroling at the CMBI, Nijmegen, The Netherlands. • Ensembl • http://www.ensembl.org • Genome databases for vertebrates and other eukaryotic species. • Ensembl Genomes • http://www.ensemblgenomes.org • Extends Ensembl across the taxonomic space. • Sanger Institute • Is working to replace their current search engine exploiting the EB-eye technology. ELIXIR WP13.4 EB-eye Feasibility Study
Approach I - Import third-party data into the EB-eye • Data integrated in the EB-eye existing architecture • Requires only schema-based dumps appropriate for the EB-eye. • Least expensive option. • Minimal efforts: • Data providers completely delegate the indexing / searching to the EB-eye. • The data becomes integrated with other data sets: • Quality Assurance. • Navigation through the cross-references. • MEROPS and GPCRDB fully integrated (Nov 08) • XML data dumps are automatically generated via Perl scripts and are publicly accessible. • The data providers agreed on which fields to index and cross-references make available. • Only few revision cycles necessary to have good data dumps. • Ensembl Genomes added to the EB-eye (Apr 09) • The EB-eye Web services are also used by the Ensembl Genomes web site. ELIXIR WP13.4 EB-eye Feasibility Study
Approach II - Full export of the EB-eye technology • Data integrated in the EB-eye existing architecture • Hardware requirements might be expensive. • Expertise and learning curves to run and administer a full production system. • Local dependencies for the EB-eye installations: know-how and man power required to maintain the local infrastructure. • EB-eye team is collaborating withthe Sanger Institute • Support for third-party customisations. • Additional documentation for third-parties: • Software architecture. • User manuals (admin / end-user). ELIXIR WP13.4 EB-eye Feasibility Study
Approach III - Export of part of the EB-eye • Partial export of the EB-eye: indexing engine • Hybrid integration model. • Searching infrastructure runs centrally at EBI. • Data providers have full control of the indexing and re-distribution of locally produced indices. • Too abstract for most users. • An attempt towards a federated search mechanism • It is not necessary since users can consume the EB-eye Web Services to integrate data into their own portals (see Ensembl Genomes). ELIXIR WP13.4 EB-eye Feasibility Study
Conclusions • The EB-eye is a flexible and scalable solution for new third-party data sources. • Most effective and quick mechanism of integration: • Exporting content data into the EB-eye system. • The new data sets added to the EB-eye becomes part of a coherent chain of cross-references. • Limitations to the distribution of the EB-eye • Not available at the moment as a downloadable SW package. • Limited human resources to support tier installations. ELIXIR WP13.4 EB-eye Feasibility Study
Future directions for the EB-eye interoperability • Attention to the quality of data provided • Data used by EB-eye should be up-to-date with mothership portals. • Cross-references need to be consistent between different resources to avoid: • The display of broken links to non-existing entries. • Discrepancies between different data sets. • Proposed features • External references. • Export of search results. • Running tools from the results (i.e. using Web Services). ELIXIR WP13.4 EB-eye Feasibility Study
Rodrigo Lopez Mickael Goujon Franck Valentin Silvano Squizzato Acknowledgements SangerCMBI ELIXIR WP13.4 EB-eye Feasibility Study