210 likes | 356 Views
SPM from an SDD perspective: Generality and extensibility. Gregor Hagedorn Federal Biological Research Center, Berlin, Germany. SDD Purpose was.
E N D
SPM from an SDD perspective: Generality and extensibility Gregor HagedornFederal Biological Research Center, Berlin, Germany
SDD Purpose was (From SDD Charter:)Develop standard computer-based mechanisms for expressing and transferring descriptive information about biological organisms or taxa (as well as similar entities such as diseases), including terminologies, ontologies, descriptions, identification tools and associated resources.
Richness & Atomization Labels, Definitions, MediaObjects(multilingual) Scopes SPM aboutTaxon SDD Representation Taxa, Speci-mens, Observ.,Publications,Parts, Stage,Sex. etc. InfoItem Scopes … RevisionData SummaryData … Character Data … … … SampleData …
Naming differences • Perhaps consider whether SPM: “context” is a good paradigm: • A measurement can be made in the context of a study, and perhaps in the context of a season • But is “geographical location”, “frequently”, “sex”, “above 1000 m” a context? • SDD distinguishes between Scope of a description = criteria by which data have been aggregated (taxon, specimen, geolocation, season, publication source, etc.) and Modifiers that modify/qualify a statement
Naming differences • “Value” for categorical measurements is OK in principle, but may affect extensibility to quantitative data. • Publication references would be needed for source of information being aggregated, or citations therein
Occurrence / Distribution • Occurrence used in contextOccurrence,Distribution as content term around it. • Is the order of information reversed? spm:contextValue = “An indication of when this information is valid according to a controlled vocabulary.” → perhaps: • Perhaps use a special type here? <spm:hasInformation><spmi:Distribution> <spm:hasValue rdf:resource="&tv;OccurrenceStatusTerm#Extinct"/> <spm:contextValue rdf:resource="&tv;GeographicRegion#ITA"/> </spmi:Distribution></spm:hasInformation> <spm:hasInformation><spmi:Distribution> <spm:hasValue rdf:resource="&tv;GeographicRegion#ITA"/> <spm:contextValue rdf:resource="&tv;OccurrenceStatusTerm#Extinct"/> </spmi:Distribution></spm:hasInformation>
Cardinality? <spm:hasInformation><spmi:Distribution> <spm:hasValue rdf:resource="&tv;OccurrenceStatusTerm#DoubtfullyNative"/> <spm:hasValue rdf:resource="&tv;OccurrenceStatusTerm#Extinct"/> <spm:contextValue rdf:resource="&tv;GeographicRegion#ITA"/> </spmi:Distribution></spm:hasInformation> <spm:hasInformation><spmi:Distribution> <spm:hasValue rdf:resource="&tv;OccurrenceStatusTerm#DoubtfullyNative"/> <spm:hasValue rdf:resource="&tv;OccurrenceStatusTerm#Extinct"/> <spm:contextValue rdf:resource="&tv;GeographicRegion#ITA"/> <spm:contextValue rdf:resource="&tv;GeographicRegion#ALB"/> </spmi:Distribution></spm:hasInformation>
Biology Cytology MolecularBiology Evolution SPM Concepts Ecology Description Physiology Distribution Size Use Conservation
Overlap! Description Biology Biology Size MolecularBiology Cytology Physiology Ecology Ecology Distribution Distribution Evolution Evolution Conservation Use
Conclusive? Description Biology Biology Morphology Size Anatomy MolecularBiology Cytology Secondary metabolites Physiology Biochemistry Ecology Ecology Distribution Distribution Evolution Evolution Conservation Use
LookAlikes Description Biology Biology Weight Morphology Size Associations Anatomy LifeExpectancy LifeCycle MolecularBiology DiagnosticDescription Behavior Cytology Physiology Secondary metabolites Biochemistry PopulationBiology Ecology Ecology Distribution Distribution Evolution Evolution Conservation Use SPM Version2007-08-15 Earlier terms used in SPM example files
Number of characters Size • LIAS has 987 “characters”, incl. ca. 30 “pseudo-characters” • GrassBase has 1090 characters LifeExpectancy • SDD concluded to separate character standardization from structural separation • Waiting for exchange of existing definitions and patterns to arise rather than round table
SPM content vocabulary • A concise “major concept headings” vocabulary like SPM is certainly desirable • But definitions are needed! • Human-readable definitions should be developed • OWL/RDF currently provides a single semantic information: Size is subclass of Description • Provision of general abstract data structures (content, value, contextXXX) should perhaps be separated from definition of biological concepts
Ontologies 1 (Descriptive Terms) Leaflike structure Stem Leaf Cladode(= stem looking like leaf) Green leaf Petal Flower Coded Summary Descriptions Taxon 1: Green leaf: Length 7 cmTaxon 2: Green leaf: Length 5 cmTaxon 3: Cladode: Length 8 cmTaxon 4: Cladode: Length 2 cm Identification: Which species have leaf-like structures on the stem between 7 and 10 cm long?
Genus spec1 Genus spec1 Genus spec2 Genus spec2 Ontologies 2 (Taxonomic Classes) Taxonomic Rank ThisFamily Family Genus Genus Genus Species Taxon concepts are a natural ontology with multiple inheritancefrom within taxon concept classes and Rank classes. Identification: Which family has species with leaf-like structures on the stem between 7 and 10 cm long?
Break down of communication? • SDD was designed for the purpose SPM has been developed for • SDD and SPM are strongly analogous • SDD has invested much time in trying to find an application profile supporting rich editing applications in a way consistent with simple identification keys and taxon-page creation software. • SDD structures and terms have not been evaluated for SPM
Interest Group:“Structured Descriptive Data”→Interest Group:“Biological Descriptions”→TG SDD-Schema→TG SPM→TG SDD/RDF???
Thank you: • For volunteering your personal time in discussions, implementation and testing! • Projects and companies for testing and implementing! • GBIF, TDWG, and BMBF for traveling and workshop support! • TDWG-IP for financing an SDD primer!