200 likes | 213 Views
Explore various DBMS types such as File System-Based, Hierarchical, Network, Relational, and Object-Oriented for ecological databases. Learn about project datasets, network database projects, and object data structures. Understand the importance of data modeling and normalization in optimizing database efficiency. Discover the challenges faced in creating a perfect data model for ecological data.
E N D
Data Models for Ecological Databases John Porter Department of Environmental Sciences University of Virginia
File system-based Hierarchical Network Relational Object-oriented You’ve seen these before, now lets go into more detail DBMS Types
File-System Based Directory Files Files Files • very simple and easy to set up • inefficient • few capabilities
Project Datasets Investigators Variables Locations Codes Methods Hierarchical • Hierarchical • efficient • not very general • e.g. phylogenetic structures • geographical images
Network Database Projects Links are hard-coded into database. They are not a property of the data Datasets Locations • very flexible • unwieldy to modify • not widely used
Projects Location_id Data_id Datasets Locations Location_id Relational Database Linkages are through the properties of the data itself - not hard coded • widely-used, mature • table-oriented • restricted range of structures
Methods Object Data Structure Object Oriented • developing -few commercial implementations • diverse structures • extensible Complex data structures, along with the methods to use the data are in the database
Data Modeling • DBMS Systems are highly flexible • Good: they can do a lot! • Bad: they have to be told how to do it! • A Database Management System is the CANVAS, the DATA MODEL is the painting…….
Data Modeling • Data modeling is used to develop the database structures used in a database • Your data model effects • reliability of the data • efficiency and speed of queries • the complexity of the database • Data modeling is an art, not a science!
Some Terminology: Tables contain attributes or fields (columns) and multiple observations or tuples (rows)
Species Observation Genus Species Observer CommonName Date Flat-file Tables in boxes Attributes in ovals
Normalization • One widely-used approach for reducing errors within a database is to normalize your data structures • Normalization is the process of eliminating duplicate or redundant information
Spec_code Spec_code Observation Species Genus CommonName Species Date Observer Two-table Relational Database
Species Observations Specimens Images Locations Observers Internet Links Complex Data Model Notation: One-to-one One-to-many or
Personnel Projects Mailing Lists Dataset Dataset Locations Variable Variable Codes Data Model for Metadata at theVCR/LTER Optional Linkage Mandatory Linkage
“Beanstalk”& “String of Pearls” • Metadata • methods • units • Location Table • Lat/Lon
Beanstalk / String of Pearls • Highly normalized • Extremely flexible - capable of handling many different kinds of data • Inefficient • Queries can be very slow • Can require large amounts of space
Why is there no perfect data model for ecological data? • One of the reasons data modeling is an ART not a SCIENCE is that ecologists use data in many different ways • Data that is perfectly formed for one kind of analysis may be unusable for another • Different analytical software may be used
Why No Perfect Model? • Generally ecologists want to use data in “flat file” formats that combine all the tables containing data into a single, denormalized “spreadsheet”-type format- but even that format can vary between researchers • ClimDB needed to support single parameter and multiple parameter formats to meet researcher needs