160 likes | 300 Views
A generic data import layer for the Berlin Taxonomic Information Model. Anton Güntsch, Andreas Müller & Walter G. Berendsohn Botanic Garden and Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics and Laboratories. The Berlin Taxonomic Information Model.
E N D
A generic data import layer for the Berlin Taxonomic Information Model Anton Güntsch, Andreas Müller & Walter G. Berendsohn Botanic Garden and Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics and Laboratories
The Berlin Taxonomic Information Model • Concepts as name-reference pairs • Explicit representation of relations between concepts • Mechanisms for calculating factual data
Berlin Model used by • Euro+Med • Med-Checklist • IOPI Species Plantarum Initiative • Algaterra • Dendroflora of El Salvador • German Standard List of Vascular Plants and Ferns • Reference List of the German Mosses • EDIT WP6
Data imports (1) • Heterogeneous sources (e.g. text files, printer-formatted data, spread sheets, DBs) • Complex target model Imports consume a substantial fraction of project costs which are often substantially underestimated.
Data imports (2) Needs a great deal of human input Can be automated
Step-by-step transformation of taxonomic information: preparation • Identify patterns • Communicate problems • Export to simple XML
Step-by-step transformation of taxonomic information: preparation <Aizoaceae xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <AcceptedTaxa> <Taxon> <ID>7814</ID> <Genus>Acrodon</Genus> <Epithet>bellidiflorus</Epithet> <AllAuthorsString>N.E.Br.</AllAuthorsString> <SubSpeciesEpi>v</SubSpeciesEpi> <AllAuthorsStringSubSpecies/> <SpeciesName>Acrodon bellidiflorus</SpeciesName> </Taxon> <Taxon> <ID>8566</ID> <Genus>Acrodon</Genus> <Epithet>subulatus</Epithet> <AllAuthorsString>(Miller) N.E.Br.</AllAuthorsString> <AllAuthorsStringSubSpecies/> <SpeciesName>Acrodon subulatus</SpeciesName> </Taxon> </AcceptedTaxa> <SynonymTaxa> […] </SynonymTaxa> </Aizoaceae>
Step-by-step transformation of taxonomic information: phase I • Transform into soft schema xml • Re-arrange, lump and split elements • Don‘t check „taxonomic integrity“ • Tools: XSLT, Taxonomic Transformation Library (TTL), and others
Step-by-step transformation of taxonomic information: phase I <BMIDataSource xmlns="http://www.bgbm.org/schemas/BMI/s0.7" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.bgbm.org/schemas/BMI/s0.7 P:\XMLSchema\ImportSchicht\BMISoft0.7.xsd"> <MetaData> […] </MetaData> <ConceptReference> <RefCategory>database</RefCategory> <RefString>Aizoaceae</RefString> </ConceptReference> <PotentialTaxa> <PTaxon> <TaxonName> <Rank>species</Rank> <GenusEpi>Acrodon</GenusEpi> <SpeciesEpi>bellidiflorus</SpeciesEpi> <AllAuthors>N.E.Br.</AllAuthors> </TaxonName> <TaxonStatus>Accepted</TaxonStatus> <IdInSource>7814</IdInSource> <RelatedTaxon ref="21" relType="basionym"/> </PTaxon> […] </PotentialTaxa> </BMIDataSource>
Step-by-step transformation of taxonomic information: phase II • Transform into strict schema XML • Check data integrity • Report malformed data • Tool: TTL
Step-by-step transformation of taxonomic information: phase II <BMIDataSource xmlns="http://www.bgbm.org/schemas/BMI/0.7" […]> <MetaData> […] </MetaData> <ConceptReference> <RefCategoryAbbrev>BK</RefCategoryAbbrev> <RefString>refString</RefString> <DatabaseID>4</DatabaseID> </ConceptReference> <PotentialTaxa> <PTaxon> <TaxonName> <SpeciesName> <GenusEpi>Acrodon</GenusEpi> <SpeciesEpi>bellidiflorus</SpeciesEpi> <AuthorTeam> <AuthorTeamCache>N.E.Br.</AuthorTeamCache> </AuthorTeam> </SpeciesName> </TaxonName> <TaxonStatusAbbrev>A</TaxonStatusAbbrev> <IdInSource>7814</IdInSource> <RelatedTaxa> […] </RelatedTaxa> </PTaxon> </PotentialTaxa> </BMIDataSource>
Step-by-step transformation of taxonomic information: phase III • Import into database • Duplicate detection and resolution • No User interaction required • Tools: Berlin Model Object Layer (BMOL)
Berlin Model Object Layer (BMOL) • Hides the database key system • Duplicate detection • Core-Module provides objects corresponding to database entities • Mapper-Module interfaces with database • Persistence-Module manages data flow between core-module and mapper-module
Outlook • Method has been successfully tested for import of Med Checklist I, II & IV • Further imports planned for 2006 • Programming of additional mapper modules desirable