220 likes | 403 Views
Biodiversity Informatics at COMSC. Biodiversity Informatics at COMSC Andrew Jones & Richard White School of Computer Science & Informatics Andrew.C.Jones@cs.cf.ac.uk R.J.White@cs.cf.ac.uk. Richard White’s interests. Design and construction of database systems to deliver biodiversity data
E N D
Biodiversity Informatics at COMSC Biodiversity Informatics at COMSC Andrew Jones & Richard White School of Computer Science & Informatics Andrew.C.Jones@cs.cf.ac.ukR.J.White@cs.cf.ac.uk
Richard White’s interests • Design and construction of database systems to deliver biodiversity data • Methods for making these systems • interoperable with other systems • adaptable for multiple uses • capable of following concept changes • deducing and maintaining information on changes • (Extracting numerical information from images, e.g. in “Morphidas” project, not described here)
Premise • Bioinformaticians want to use information about the species whose genetic material is being studied to understand their development • Biodiversity scientists (including taxonomists, ecologists, etc.) want to use molecular data to enhance their classifications, phylogenies and models
Biodiversity informatics Therefore • Bioinformatic and biodiversity data need to be linked together in many analyses • Links often involve the species name as the key linking element
Species naming in a nutshell (Corylus avellana L.) • Common (vernacular) names • Latin descriptive phrases • Linnaeus: binomial nomenclature • Adanson: rules for precedence etc. • Accepted names and synonyms • Checklists (e.g. the Catalogue of Life …) • Data (in different formats, e.g. Buffie …) is usually linked to species names • Taxon concepts (including species and higher taxa such as genera, families, etc.) • Tracking changes in taxon concepts …
Species 2000 & ITIS • International programme to assemble data from “Global Species Databases” (GSDs) and deliver the Catalogue of Life (CoL) • Authoritative up-to-date checklist of all the world’s species (1.3 out of 1.8m) • Reference list of taxon concepts (with unique identifiers) to aid indexing and cross-referencing of species data sources • Available on DVD, through the Web (www.sp2000.org) and by using electronic (“web”) services
4D4Life project • “Distributed Dynamic Diversity Databases for Life”, EU project 2009 – 2012 • Carry the Catalogue of Life forward with improved sustainable infrastructure • In COMSC we are designing a new architecture and will deliver a working prototype • Service-oriented, re-usable components
Re-usable components • GSD editors create a data resource “GSD1” • CoL partners create the Catalogue of Life from such resources • A user creates a new product using the Catalogue of Life 1 2 3
Interoperability • Catalogue of Life • GSDs are heterogeneous in • Content • Access methods • More generally • Multiple data representations & exchange formats • Changing concepts of taxa (and geography)
ENBI project and BUFFIE • “European Network for Biodiversity Information”, EU project 2003-2006 • Mostly reporting on standards, practices and recommendations • In COMSC, R. Sundaravadivelu developed a prototype interoperability demonstrator (BUFFIE, “Biodiversity Users Framework For Information Exchange”) • Accepts data sources using different protocols and XML formats • Provides a merged response in an XML format and protocol of the user’s choice
A world of resources • Imagine a digital world full of biodiversity data and analytical resources like these, just as there is in bioinformatics • How will users be able to find out what resources there are and how to use them in combination to answer scientific questions?
The cross-mapping problem Taxonomy 2 Faba faba Caesalpinia crista L. Caesalpinia bonduc (L.) Roxb. Caesalpinia crista L., p.p. Taxonomy 1 Vicia faba Caesalpinia crista L.
i4Life 4D4Life
Constraints and checklists • (From Litchi 1) • “A full name which is not a pro-parte name may not appear as both an accepted name and a synonym in the same checklist”
Persistent identifiers and change In i4Life we need to • Use persistent identifiers for taxon concepts • (started in TDWG-TIP project) • Link taxonomies and track change • create and maintain “cross-maps”
Workflow problems addressed • Incorporation of biodiversity services in workflows (BiodiversityWorld) • Authentication in a workflow environment (ASMIMA) • Rich annotation of services; discovery (Ewen Orme’s PhD) • Knowledge-based assistance for workflow creators (Russell McIver’s PhD) • Improving the User Experience (ACJ’s main contribution to BioVeL proposal)
Andrew Jones’ interests • Naming & concepts • Accurately identifying concepts • Tracking change • Making scientific workflow systems usable by non-computer scientists • Hiding “programming” complexity • Helping to find resources & build workflows • Environments to support collaborative scientific research • E.g. “doing” taxonomy
Future projects • We research solutions for data-handling problems faced by biologists and bioinformaticians • If you think you might have an interesting and challenging problem, please get in touch • Andrew.C.Jones@cs.cf.ac.uk R.J.White@cs.cf.ac.uk