390 likes | 526 Views
Global Working Checklist of Compositae. A TICA Project Seed Funded by GBIF ECAT. Long Term Vision. Peter Raven (email to Vicki Funk 9.9.2005): “…Whatever happens, we want and need one consolidated, agreed list [of Compositae species], and not a series of choices from various lists.”.
E N D
Global Working Checklist of Compositae A TICA Project Seed Funded by GBIF ECAT
Long Term Vision Peter Raven (email to Vicki Funk 9.9.2005): “…Whatever happens, we want and need one consolidated, agreed list [of Compositae species], and not a series of choices from various lists.”
How to get there? • Phase 1: • Creation, consolidation and initial editing of a list of names of taxa integrated from existing electronic checklists and floras that are (nearly) complete, and which are available in structured databases or digital form. • Followed by processing hard copy publications.
How to get there? • Phase 2: • Full or partial checklist reports for taxa available for downloading from the TICA website. • Taxonomists to examine taxonomy and nomenclature. • Recoding of comments and corrections.
How to get there? • Phase 3: • Dealing with taxonomic differences.
GBIF ECAT Seed Fund • Duration: 1 March 2006 – 31 August 2007. • Partners: • Landcare Research, New Zealand (Lead Partner) • Missouri Botanical Garden; • Royal Botanic Gardens, Kew; • Botanic Garden and Botanical Museum Berlin-Dahlem; • Australian National Herbarium, Centre for Plant Biodiversity Research, CSIRO; • University of Tokyo; • Smithsonian Institution; • South African National Biodiversity Institute, Pretoria (SANBI); • Instituto de Botánica Darwinion, Buenos Aires • The International Compositae Alliance (TICA).
GBIF ECAT Seed Fund Project Team • Jerry Cooper & Ilse Breitwieser • Aaron Wilton • Kevin Richards • Christina Flann
Scope of the ECAT GBIF project • Creation, consolidation and initial editing of names of Compositae taxa integrated from existing electronic checklists and Floras that are complete, or nearly complete, and which are available in structured databases or digital form. • This will be followed by processing additional digital and hard copy publications (as many as possible within timeframe). Phase 1 of Compositae checklist
Contracted objectives of the project • The collation and integration of prioritised existing checklists into a Global working Checklist of the Compositae; • Where possible resolve and complete nomenclatural content (including homotypic synonyms); • Capture, examine, report and resolve (as much as possible) differences in taxon concepts; • Provide data contributors with regular reports of editorial changes; • Make the developing checklist accessible via the Internet, hosted by TICA, and eventually linked to GBIF ECAT; • Provide a framework for facilitating information flow and content revision among data contributors and the broader TICA community; • Provide a substantial information basis (including a gap analysis) and operating framework for the completion and long-term maintenance of the global checklist.
Aims of the workshop • Awareness of the project • Feedback • Phase 2/3 discussion, agreement, and planning • How to continue once GBIF contract is finished (Aug. 2007)? • Future funding? • Decision on mechanisms for dealing with taxonomic differences need to be made at this workshop. • Possible models: creation of an editorial board supported by specialist subgroups who will determine authoritative taxonomic views. What lessons learnt from similar projects, e.g. Euro+Med?
Global Working Checklist of Compositae Background to the project Jerry Cooper
What is GBIF? • The Global Biodiversity Information Facility • Formed in 2000. Secretariat in Copenhagen • Intergovernmental. 47 country signatories, and based on a ‘Memorandum of Understanding’ • In support of the Convention on Biological Diversity (CBD) • An Internet based data sharing network for collection/observation/taxonomic data • Currently serves 96 million records from 707 sources, and growth is remains exponential
What is ECAT? • Electronic Catalogue of Names of Known Organisms • A principle GBIF work programme • Names of Taxa are the key to unlocking biodiversity data • GBIF Seed funding awarded annually to start key databasing projects to deliver the ECAT
Why is ECAT a database mediated programme? • Why a database? • It makes explicit (‘unlocks’) the implicit information content of a checklist • Ease of maintenance and transparency of derivation of content • Application of Unique Identifiers facilitates digital connectivity of information across linked resources • Efficient & flexible (re)use of information in many forms
Why is ECAT a database mediated programme? • Why necessary to collate existing digital data as a first step? • One centralized database, or multiple, distributed, connected databases?
Why is ECAT a database mediated programme? A global database of names of taxa, and taxon concepts, will provide an essential digital backbone for unlocking and linking existing digital data, and for facilitating future taxonomic [database] checklist work.
Related global ‘names’ initiatives • Catalogue of Life Consortium (Species 2000/ITIS), uBIO, GenBank Taxonomic Framework, CBOL … • Taxonomic Databases Working Group
What are we trying to achieve in this GBIF seed project? Phase 1 IS NOT a taxonomic project • The emphasis is on: • Collating and integrating existing digital data • Applying data standards • Providing the resulting digital backbone as a service • The value of the resulting consolidated database is considerable: • Consistent nomenclature • Gap analysis • Identifying taxonomic opinion • Significant contribution to the global, digitally accessible catalogue of life • ‘Digital backbone’ of Compositae information
Scope & priorities for collation • Nomenclature • Genus/species/infraspecific epithets (+orthographic variants) • Linkage to basionym/replacement names (providing homotypic synonymy) • Standardized Authors • Linkage to place of publication
Scope & priorities for collation • Taxonomic Opinion • Heterotypic synonyms • Preferred name for synonyms according to X in publication Y (basic taxon concepts) • Position in a taxonomic hierarchy (genus-tribe-family – FGVP)
Scope & priorities for collation • Metadata • Who provided which data • How the provided data was consolidated, edited, and any consensus derived • Unique identifiers for tracking both names & taxon concepts
Limitations to what we can achieve in phase 1 • Consensus taxonomic opinion? • Infraspecific names? • Common names? • Distribution information? • Consolidated bibliography? • Published revisions?
Key technical outputs from phase 1 • Feedback to providers on overlap/mismatch • Provision of URIs • Web site providing easy access to information • Web services • providing end users with ability to incorporate/link catalogue data into other, new/existing work and maintain currency of these data • providing GBIF ECAT with current information
Global Working Checklist of Compositae:Project Methodology Aaron Wilton
From Agenda Project Details • Information ownership and acknowledgement • The proposed methodology • Nomenclature and Taxonomy • Data integration methodology and the priority databases • Database contributors • Information services
Process Overview 4. Integrate 1. Export 2. Transform 3. Import Database Data set ChecklistDatabase 6. Report Data set Database 7. Checklist Website Database Data set 5. Edit
1. Data sets from Providers • Format flexible • Content • Nomenclature • Taxonomy • References/Literature • Important: Unique ID’s and Modified Dates • Metadata for website
2. Transformation and Importation • Transformation • Convert to standard format • Largely manual • Importation • Data sets added as prepared • Maintain distinct records • Linked to provider metadata
4. Integration • Build list of “consensus records” • Two steps • Matching records • Calculating consensus record • Matching • Use nomenclatural data • Exact and fuzzy matches • Matched records linked to consensus record • New records assigned unique id
2 Antennaria Fr. 2 Antennaria Gaertn. 3 Anaphalis DC. Anaphalis DC. 3 Antennaria ? 4. Example of matching Consensus records Provider records 1 Antennaria Link ex Fr. Antennaria Link ex Fr. 1 Antennaria Gaertn. Antennaria Gaertn.
4. Calculating Consensus • Calculate from all linked records • Each field based on majority except • Ties • Editors record
Consensus Antennaria Gaertn. 1821 Fruct. Sem. Pl. 2 <null> Warning Warning Consensus Editor 1791410 Antennaria Gaertn. 1791 Fruct. Sem. Pl. 2 410 4. Example Name Author Year Citation Page 1 Antennaria Gaertn. 1821 Fruct. Sem. Pl. 2 410 2 Antennaria Gaertn. 1821 Fruct. Sem. Pl. 2 419 3 Antennaria Gaertn. 1791
5. Editing • Data priorities • Nomenclatural • References • Taxonomy • Other data • Process • Resolve data conflicts • Verify links (provider to consensus records) • Verify difference between near matches • Fill gaps • Editorial work recorded • Editors record created to record changes and inserts • Verification flags
6. Reporting • Webservices • Available to Data providers • Html or xml • Functions will provide means to get • Full consensus data for a name • Comparisons matrix showing • TICA ID and other provider IDs • Full data by data provider • Resolution of deprecated TICA ids • Get all TICA ids • Manual • As required • Gap analysis
7. Website • Website present data for • Consensus record • Taxonomic concepts • Hybrid, preferred name • Acknowledge contributions • Automatically updated
Summary of Scope Capture Integrate Edit Display Nomenclature Taxonomy () () Literature () () () Other ()
Work Plan • Integrator Development • Nomenclature & Taxonomy (May – Sept) • Literature (Sept – Dec) • Web site • Initial conversion (Complete) • New reports and enhancements (Nov - Dec) • Web services (Nov/Dec 2006) • Data Editing (1 Sept 2006 to 30 August 2007) • Data sets from Providers (now – end August?)
Data Received to Date • IPNI (Compositae) • Kadereit et al. Compositae from Families and Genera of Vascular Plants • World Checklist of Seed Plants (A-I), Rafaël Govaerts • Flora of Japan • New Zealand Plant Names