250 likes | 413 Views
Increasing Usability of Biodiversity Databases through Semantic Enrichment. Klaus Riede Zoologisches Forschungsinstitut & Museum Alexander Koenig (ZFMK) Adenauerallee 150-164 53113 Bonn, Germany. Semantic Enrichment : Some examples. Huge Biodiversity Databases already exist.
E N D
Increasing Usability of Biodiversity Databases through Semantic Enrichment Klaus Riede Zoologisches Forschungsinstitut & Museum Alexander Koenig (ZFMK) Adenauerallee 150-164 53113 Bonn, Germany
Semantic Enrichment:Some examples..... Huge Biodiversity Databases already exist. They cover distinct organims: Fishbase, Orthoptera Species File OR Distinct themes: Threat: IUCN Red List Database (www.redlist.org) Migration: Global Register of Migratory Species (www.groms.de) Why do we need semantic enrichment?
Semantic Enrichment:Some examples..... Try to search for: Number of „Extinct Tropical Timber Trees“ Database: IUCN Red List Database (www.redlist.org) Query: Tropical tree Problem: plants are not classified according to life-form Plant families such as TAXODIACEAE comprise trees (e.g. Taiwania cryptomeroides - VULNERABLE) CUPRESSACEAE contain shrubs (Actinostrobus) AND trees ( Thuja spp.)
Semantic Enrichment:Searching for Red-Listed Trees To search the IUCN Red List Database (www.redlist.org) for „Threatened“ trees, you have to know plant taxonomy: Searching the Order CONIFERALES (containing Taxodiaceae trees): 16 Critically Endangered, 43 Endangered, 93 Vulnerable, ...but some of those are shrubs (Cupressaceae: Actinostrobus) Threatened Cupressaceae: 2 Critically Endangered, (e.g. Thuja sutchuensis) 15 Endangered, (e.g. Juniperus cedrus) 25 Vulnerable (e.g. Cupressus gigantea)
Semantic enrichment is necessary to search for „Trees“ http://www.botanik.uni-bonn.de/conifers/index.htm
Two Worlds: Relational databases and complex data sets Relational Databases Digital Orthoptera Specimen Access SYSTAX GROMS Global Register of Migratory Species Complex data sets Sounds, Pictures gene sequences (links) geographic coordinates Maps (GIS-data: shapes)
Example #1Data-mining for Knowledge Gaps The „Global Register of Migratory Species“ Database contains literature citations on migration. Knowledge gaps were detected by searching for text strings such as: poor* , little known, unknown www.groms.de
The relational organisation of the GROMS database allows application of SQL queries for text-mining: References Table: Joint Table: Species Table ID Author, Title etc Lit_ID Species_ID Text: [.................. ....migration... unknown...................................] 1:many ID Taxon name Migration Red List status, etc many:1 5,500 entries 8,500 entries 4,355 entries Many:Many relation connects References and Species Names
SQL statement:Searching for non-passerine birds with poorly known migration behaviour:
Result: 349 birds with unsufficiently known migration behaviour www.groms.de mainly based on „Handbook of the birds of the World (del Hoyo et al. 1992-2003
Example #2:Automatic Annotation of Sound Parameters The Orthoptera Song Repository of the DORSA project was used to annotate all 5,000 sound files automatically with sound parameters. Sound parameters were added to the SysTax database, which stores specimen data from various museum databases, including herbaria. The annotated SysTax Oracle database is now searchable for sound parameters, such as Carrier Frequency and Pulse Rate
Deutsche Orthopteren Sammlungen - www.dorsa.de Orthopteren-Typenmaterial in deutschen Museen.
Deutsche Orthopteren Sammlungen - www.dorsa.de • Überprüfung, Bestimmung, Verifizierung von • Angaben über Typenmaterial, • Auffinden „historischer“ Typen, • Festlegung von Lektotypen
Deutsche Orthopteren Sammlungen - www.dorsa.de Taxonomic database (OSF: Orthoptera Species File, USA) Specimens (german museums, phonotheks) (www.dorsa.de) Mutual links
Extraction of sound parameters by using MatLab Software Carrier frequency Pulse rate Carrier frequency In cooperation with: Dept of Neuroinformatics, Ulm
Enriched sound file table:pulse distance, length, frequency etc were added to the SYSTAX table
Bioacoustic, automatised classification of ethospecies allows Rapid Assessment • Mapping with microphones allows to answer • important research questions, such as: • species ranges/ endemism • species abundance • species turnover • community patterns • activity patterns • vulnerability to habitat degradation • - extermination rates
Example #3Enriching databases with Geographic information - Adding lat-lon coordinates by Geo-referencing - GIS Analysis of complex geometries (shapes) by intersection with other GIS-layers and subsequent update
Georeferencing is necessary to update place names with lat-lon data ?
Geographic coordinates were added to place names, using Times Atlas or gazetteers (Getty, Alexandria Project)
Mapping requires specimen data enriched with geographic coordinates The DORSA mapserver is available at www.dorsa.de
Deutsche Orthopteren Sammlungen - www.dorsa.de Herkunftsländer des Typenmaterials in deutschen Museen
Example #3Enriching databases with Geographic informationbased on GIS calculation of range territories Distribution maps (shapes) are available at www.groms.de
Import of Intersection Results:1,000 mapped species - 2,522 administrative units 340,000 combinations (dbf attribute table:province – species) Queensland search results:
Summary:Semantic enriching of relational databases is possible by automatic annotation: Relational database External data set (sounds, GIS) Link Running annotation program (eg GIS intersection Enriched Relational Database Table with annotation Results Importing Result table Enrichment allows SQL retrieval of complex data parameters