A generic data import layer for the Berlin Taxonomic Information Model

A generic data import layer for the Berlin Taxonomic Information Model Anton Güntsch, Andreas Müller & Walter G. Berendsohn Botanic Garden and Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics and Laboratories

The Berlin Taxonomic Information Model • Concepts as name-reference pairs • Explicit representation of relations between concepts • Mechanisms for calculating factual data

Berlin Model used by • Euro+Med • Med-Checklist • IOPI Species Plantarum Initiative • Algaterra • Dendroflora of El Salvador • German Standard List of Vascular Plants and Ferns • Reference List of the German Mosses • EDIT WP6

Data imports (1) • Heterogeneous sources (e.g. text files, printer-formatted data, spread sheets, DBs) • Complex target model  Imports consume a substantial fraction of project costs which are often substantially underestimated.

Data imports (2)

Data imports (2) Needs a great deal of human input Can be automated

Step-by-step transformation of taxonomic information: preparation • Identify patterns • Communicate problems • Export to simple XML

Step-by-step transformation of taxonomic information: preparation <Aizoaceae xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <AcceptedTaxa> <Taxon> <ID>7814</ID> <Genus>Acrodon</Genus> <Epithet>bellidiflorus</Epithet> <AllAuthorsString>N.E.Br.</AllAuthorsString> <SubSpeciesEpi>v</SubSpeciesEpi> <AllAuthorsStringSubSpecies/> <SpeciesName>Acrodon bellidiflorus</SpeciesName> </Taxon> <Taxon> <ID>8566</ID> <Genus>Acrodon</Genus> <Epithet>subulatus</Epithet> <AllAuthorsString>(Miller) N.E.Br.</AllAuthorsString> <AllAuthorsStringSubSpecies/> <SpeciesName>Acrodon subulatus</SpeciesName> </Taxon> </AcceptedTaxa> <SynonymTaxa> […] </SynonymTaxa> </Aizoaceae>

Step-by-step transformation of taxonomic information: phase I • Transform into soft schema xml • Re-arrange, lump and split elements • Don‘t check „taxonomic integrity“ • Tools: XSLT, Taxonomic Transformation Library (TTL), and others

Step-by-step transformation of taxonomic information: phase I <BMIDataSource xmlns="http://www.bgbm.org/schemas/BMI/s0.7" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.bgbm.org/schemas/BMI/s0.7 P:\XMLSchema\ImportSchicht\BMISoft0.7.xsd"> <MetaData> […] </MetaData> <ConceptReference> <RefCategory>database</RefCategory> <RefString>Aizoaceae</RefString> </ConceptReference> <PotentialTaxa> <PTaxon> <TaxonName> <Rank>species</Rank> <GenusEpi>Acrodon</GenusEpi> <SpeciesEpi>bellidiflorus</SpeciesEpi> <AllAuthors>N.E.Br.</AllAuthors> </TaxonName> <TaxonStatus>Accepted</TaxonStatus> <IdInSource>7814</IdInSource> <RelatedTaxon ref="21" relType="basionym"/> </PTaxon> […] </PotentialTaxa> </BMIDataSource>

Step-by-step transformation of taxonomic information: phase II • Transform into strict schema XML • Check data integrity • Report malformed data • Tool: TTL

Step-by-step transformation of taxonomic information: phase II <BMIDataSource xmlns="http://www.bgbm.org/schemas/BMI/0.7" […]> <MetaData> […] </MetaData> <ConceptReference> <RefCategoryAbbrev>BK</RefCategoryAbbrev> <RefString>refString</RefString> <DatabaseID>4</DatabaseID> </ConceptReference> <PotentialTaxa> <PTaxon> <TaxonName> <SpeciesName> <GenusEpi>Acrodon</GenusEpi> <SpeciesEpi>bellidiflorus</SpeciesEpi> <AuthorTeam> <AuthorTeamCache>N.E.Br.</AuthorTeamCache> </AuthorTeam> </SpeciesName> </TaxonName> <TaxonStatusAbbrev>A</TaxonStatusAbbrev> <IdInSource>7814</IdInSource> <RelatedTaxa> […] </RelatedTaxa> </PTaxon> </PotentialTaxa> </BMIDataSource>

Step-by-step transformation of taxonomic information: phase III • Import into database • Duplicate detection and resolution • No User interaction required • Tools: Berlin Model Object Layer (BMOL)

Berlin Model Object Layer (BMOL) • Hides the database key system • Duplicate detection • Core-Module provides objects corresponding to database entities • Mapper-Module interfaces with database • Persistence-Module manages data flow between core-module and mapper-module

Outlook • Method has been successfully tested for import of Med Checklist I, II & IV • Further imports planned for 2006 • Programming of additional mapper modules desirable

www.bgbm.org/biodivinf/

A generic data import layer for the Berlin Taxonomic Information Model

A generic data import layer for the Berlin Taxonomic Information Model

Presentation Transcript

The Case for a Generic e-Procurement Model

A Logical Model for Taxonomic Concepts for Expanding Knowledge using Linked Open Data

A Logical Model for Taxonomic Concepts for Expanding Knowledge using Linked Open Data

On a Generic Uncertainty Model for Position Information

Taxonomic and Nomenclature Data

Data Import

The Generic Statistical Information Model

Taxonomic information systems

A Generic Coupler for Data Registration, Match, and Model Coupling

Generic logic model

Right Information available of Import Data

Find out the information about import data

Indian Import Export Data Essential Information

Import data

Import Data

The Spatio-Taxonomic Data Quality API

Bulk Import Data using the Import Tool

Generic logic model

A Generic Coupler for Data Registration, Match, and Model Coupling

Generic Statistical Information Model (GSIM)

A GENERIC MODEL FOR A PEDAGOGICALLY SUCCESSFUL DIDACTIC DELIVERY

The Model Layer