220 likes | 366 Views
Distributed Dynamic Diversity Databases for Life www.4d4life.eu An e-infrastructure for biodiversity informatics. Richard White. Objective. To build a new state-of-the-art “e-infrastructure” for the Catalogue of Life, which will form the basis for a sustainable Catalogue of Life
E N D
Distributed Dynamic Diversity Databases for Life www.4d4life.eu An e-infrastructure for biodiversity informatics Richard White Richard White, KIS talk, COMSC, Cardiff University, 26 October 2009
Objective To build a new state-of-the-art “e-infrastructure” for the Catalogue of Life, which will • form the basis for a sustainable Catalogue of Life • replace the current system at the end of the 4D4Life project Richard White, KIS talk, COMSC, Cardiff University, 26 October 2009
What is the Catalogue of Life? • “A coherent classification and species checklist of the world’s plants, animals, fungi and microbes is fundamental for accessing information about biodiversity. The Catalogue of Life provides the world with a unique service: a dynamically updated global index of validated scientific names, synonyms and common names integrated within a single taxonomic hierarchy.” Richard White, KIS talk, COMSC, Cardiff University, 26 October 2009
(from the 4D4Life web site) • “The Catalogue of Life was initiated as a European Scientific Infrastructure under FP5 and has a distributed knowledge architecture. Its federated e-compendium of the world’s organisms grows rapidly (now covering well over one million species), and has established a formidable user base, including major global biodiversity portals as well as national biodiversity resources and individual users worldwide.” Richard White, KIS talk, COMSC, Cardiff University, 26 October 2009
(more from the 4D4Life web site) • “… this 4D4Life Project will establish the Catalogue of Life as a state of the art e-science facility based on an enhanced service-based distributed architecture … • “… available for integration into analytical and synthetic distributed networks such as those developing in conservation, climate change, invasive species, molecular biodiversity and regulatory domains …” Richard White, KIS talk, COMSC, Cardiff University, 26 October 2009
(even more from the 4D4Life web site) • “… 4D4Life will strengthen the development of Global Species Databases that provide the core of the service, and extend the geographical reach of the programme beyond Europe by realising a Multi-Hub Network integrating data from China, New Zealand, Australia, N. America and Brazil.” • Apologies to those who’ve seen the next three diagrams many times … Richard White, KIS talk, COMSC, Cardiff University, 26 October 2009
Taxonomic hierarchy (or hierarchies) Species Global species databases (GSDs) and interim checklists: the species index interim checklists GSD Species information sources (SISs): regional faunas and floras, specialist or sectoral databases, web pages etc. SIS Data organisation Note that this diagram does not show the hierarchical attachment points of GSDs’ hierarchies to the main “management” hierarchy. Richard White, KIS talk, COMSC, Cardiff University, 26 October 2009
User interface Data collector (CAS) Wrapper Wrapper Wrapper GSD GSD GSD Wrappers to facilitate data transfer Richard White, KIS talk, COMSC, Cardiff University, 26 October 2009
A distributed catalogue ... GSD Web front-end GSD Other software clients of Catalogue of Life (e.g. using it as their “taxonomic backbone”) CAS GSD Richard White, KIS talk, COMSC, Cardiff University, 26 October 2009
Benefits of a new architecture Practical • Address issues of maintainability, manageability and extensibility of the system and the Catalogue • Use community standards Political • “Facilitate structured information exchange within the project networks” • “Synthesise a globally significant resource for science” • “Disseminate this in an array of modern Web services and products” [Bisby, Brussels, 10 Nov 2008] Richard White, KIS talk, COMSC, Cardiff University, 26 October 2009
People • Work Package 7: • Cardiff University School of Computer Science: Alex Hardisty, Andrew Jones, Richard White(leader), two Research Assistants (being appointed). • Design Team: • Alex Hardisty (chairman), Andrew Jones, Richard White, Sara Oldfield (Services Team, BGCI), Thierry Bourgoin (Paris), Jiří Kvaček (Prague), Peter Schalk & Wouter Addink (ETI , Amsterdam), Frank Bisby & Yuri Roskov (Reading) Richard White, KIS talk, COMSC, Cardiff University, 26 October 2009
Tasks in WP7 • 7.1: Requirements gathering and specification • Requirements list • Design principles • 7.2: System architecture design • 7.3: Data formats and protocols • 7.4: Proof of concept demo • 7.5: Test implementation • 7.6: Enhanced prototype Richard White, KIS talk, COMSC, Cardiff University, 26 October 2009
Task 7.1: Requirements gathering • Requirements list • Input from • previous documents, project proposal • users and partners, data providers, regional hubs, management • Catalogue of Life partners Species 2000 and ITIS • On • scenarios, use cases, users, data • needs for secure access, feedback from users, alternative classifications • Proposal for comment to • biodiversity informatics organisations (GBIF and TDWG) • institutional, project and corporate users (EoL, CBoL, EBI, NCBI, LifeWatch, …) • Monitor current developments in TDWG , GBIF, etc. • Review requirements Richard White, KIS talk, COMSC, Cardiff University, 26 October 2009
Requirements list • Community involvement: • user interfaces, discoverability • compatibility with biodiversity informatics community infrastructure, related data sources • benefits for data providers, rapid sharing of changes in data with others, feedback • Data handling and synchronisation: • creating and harvesting check-lists, internal CoL data management, regional hubs • data model and data elements: more flexibility, including alternative classifications • Monitoring, control and management: • manual and automated tools, statistics, usage, alerts, response procedures, quality control, backup Richard White, KIS talk, COMSC, Cardiff University, 26 October 2009
Task 7.1: Specification Design principles • Distributed, based on services to provide most functions • Efficient data flow & management, testability, maintainability, sustainability, scalability, response time, resilience • Re-use existing components (services), preferably open-source (minimise work needed, encourage others) • Compatibility of components (avoid unnecessary work) • Compliant with relevant community standards (W3C, TDWG, GBIF, etc.) Opportunities • Semantic interoperability, content negotiation , Linked Data initiative Richard White, KIS talk, COMSC, Cardiff University, 26 October 2009
Task 7.2: System architecture design • System architecture • functions carried out by the services • Conceptual data models • data flow diagram • entity-relationship diagram(s) • “base schema” for the CoL checklist databases • Implementation schemas for • global and regional hubs' core databases • separate stand-alone use (e.g. Annual Checklist CD) • tracking taxon concept changes • CoL Metadatabase Richard White, KIS talk, COMSC, Cardiff University, 26 October 2009
Task 7.3: Data formats and protocols • System components for • data providers and harvesting • taxon tracking and quality control • system monitoring and management • data querying and export, user interfaces • linking hubs (comparison, synchronisation, navigation) • Flexible options for data transfer • content: CoL standard data, CDM, Darwin Core, Linnean Core, … • file syntax: XML, TCS, RDF, … • protocol: web services, DiGIR, Tapir, GBIF IPT, … • services: for data transfer, retrieval and querying • For system monitoring: • SNMP, BPEL, ...? Richard White, KIS talk, COMSC, Cardiff University, 26 October 2009
Task 7.4: Proof-of-concept demo • This will demonstrate the feasibility of the proposed new architecture • It will not be expected to do anything useful ☺ • It may be subject to experimentation and revision Richard White, KIS talk, COMSC, Cardiff University, 26 October 2009
Task 7.5: Test implementation • Develop the proof-of-concept demo system into a Prototype System • With some working components • Carry out some of the functions expected by users • Tested by users Richard White, KIS talk, COMSC, Cardiff University, 26 October 2009
Use of existing tools • For the infrastructure • service implementation packages • For system monitoring • Nagios, OpenNMS, Munin, ...? • For check-list incorporation • to provide existing databases as GSDs • wrappers, GBIF IPT • to create new GSDs • Online Taxonomic Workbench? • software used by NZOR? Richard White, KIS talk, COMSC, Cardiff University, 26 October 2009
Task 7.6: Enhanced prototype • Like the test implementation … • … but with more stuff • Developed in association with WP6 (ETI) • To be handed over to ETI for transmogrification during year 3 into the new Production System … • … which after user acceptance trials will replace the improved current system Richard White, KIS talk, COMSC, Cardiff University, 26 October 2009
Issues for discussion • Moving from the old concept of a database-driven web site to a provider of electronic services to an “eco-system” • What are we missing? • What is interesting or more widely applicable? • I didn’t mention LSIDs • Etc. … Richard White, KIS talk, COMSC, Cardiff University, 26 October 2009