530 likes | 711 Views
Introduction to Ecoinformatics: Past, Present & Future. William Michener LTER Network Office, University of New Mexico January 2007. Outline. Ecoinformatics: a definition A science vision Information challenges Ecoinformatics “solutions”. Outline. Ecoinformatics: a definition
E N D
Introduction to Ecoinformatics: Past, Present & Future William Michener LTER Network Office, University of New Mexico January 2007
Outline • Ecoinformatics: a definition • A science vision • Information challenges • Ecoinformatics “solutions”
Outline • Ecoinformatics: a definition • A science vision • Information challenges • Ecoinformatics “solutions”
Ecoinformatics A broad S&T discipline that incorporates both concepts and practicaltools for the understanding, generation, processing, and propagation of ecological data, information and knowledge.
Outline • Ecoinformatics: a definition • A science vision • Information challenges • Ecoinformatics “solutions”
Many studies employ a restricted scale of observation -- Commonly 1 m2 The literature is biased toward single and small scale results
Thinking Outside the “Box” LTER Time Biocomplexity Parameters Space NEON, WATERS, OOI, …. Increase in breadth and depth of understanding.....
Grand environmental challenges 2000 1998 2001 2003 2004 2004
More and more of the ecological questions that confront society are national, continental and global in scope Source: CDC Drought Source: Drought Monitor
26 NSF LTER Sites in the U.S. and the Antarctic: > 1,600 Scientists; 6,000+ Data Sets—different themes, methods, units, structure, …. LTER
NEON Climate Domains 1 1 2 2 3 3 4 4 16 15 14 20 11 12 11 10 17 13 16 10 18 16 14 13 17 12 18 20 19 15 19 9 9 8 8 7 7 6 6 5 5
BioMesonet Tower and Sensor Arrays Aquatic Arrays
Micron-scale nitrate ISE Soil Sensor Arrays
Small-Organism Tracking: Mobile animals as bio-sentinels for environmental change, forecasting biological invasions, emerging disease spread
Outline • Ecoinformatics: a definition • A science vision • Information challenges • Ecoinformatics “solutions”
Satellite Images High GIS Weather Stations Most Ecological Data Data Volume (per dataset) Business Data Biodiversity Surveys Primary Productivity Most Software Population Data Gene Sequences Soil Cores Low High Complexity/Metadata Requirements Characteristics of Ecological Data
Data Entropy Time of publication Specific details General details Retirement or career change Information Content Accident Death Time (Michener et al. 1997)
Semantics A B B • Schema transform • Coding transform • Taxon Lookup • Semantic transform • Imagine scaling!! C C
Semantics—Linking Taxonomic Semantics to Ecological Data Elliot 1816 R. plumosa • Taxon concepts change over time (and space) • Multiple competing concepts coexist • Names are re-used for multiple concepts Gray 1834 R. plumosa Rhynchospora plumosa s.l. R. Plumosa v. intermedia R. plumosa v. plumosa Chapman 1860 R. Plumosa v. interrupta R. intermedia Kral 1998 R. pineticola R. plumosa Peet 2002? R. plumosa v. pinetcola R. plumosa v. plumosa R. sp. 1 A B C from R. Peet
Outline • Ecoinformatics: a definition • A science vision • Information challenges • Ecoinformatics “solutions”
Experimental Design Methods Data Design Data Forms Field Computer Entry Electronically Interfaced Field Equipment Electronically Interfaced Lab Equipment Research Program Investigators Studies Quality Control Raw Data File Data Entry Quality Assurance Checks Data verified? no • Standard Operating Procedures • Policies • Data sharing • Computer use • Archive storage Data Contamination yes Summary Analyses Archive Data File Metadata Data Validated Archival Mass Storage Off-site Storage Magnetic Tape / Optical Disk / Printouts Investigators Access Interface Secondary Users Publication Synthesis
Ecoinformatics solutions • Data design • Data acquisition • QA/QC • Data documentation (metadata) • Data archival
Ecoinformatics solutions • Data design • Data acquisition • QA/QC • Data documentation (metadata) • Data archival
Data Design • Conceptualize and implement a logical structure within and among data sets that will facilitate data acquisition, entry, storage, retrieval and manipulation.
Database Types • File-system based • Hierarchical • Relational • Object-oriented • Hybrid (e.g., combination of relational and object-oriented schema) Porter 2000
Data Design: 7 Best Practices • Assign descriptive file names • Use consistent and stable file formats • Define the parameters • Use consistent data organization • Perform basic quality assurance • Assign descriptive data set titles • Provide documentation (metadata) from Cook et al. 2000
1. Assign descriptive file names • File names should be unique and reflect the file contents • Bad file names • Mydata • 2001_data • A better file name • Sevilleta_LTER_NM_2001_NPP.asc • Sevilleta_LTER is the project name • NM is the state abbreviation • 2001 is the calendar year • NPP represents Net Primary Productivity data • asc stands for the file type--ASCII
2. Use consistent and stable file formats • Use ASCII file formats – avoid proprietary formats • Be consistent in formatting • don’t change or re-arrange columns • include header rows (first row should contain file name, data set title, author, date, and companion file names) • column headings should describe content of each column, including one row for parameter names and one for parameter units • within the ASCII file, delimit fields using commas, pipes (|), tabs, or semicolons (in order of preference)
3. Define the parameters • Use commonly accepted parameter names that describe the contents • e.g., precip for precipitation • Use consistent capitalization • e.g., not temp, Temp, and TEMP in same file • Explicitly state units of reported parameters in the data file and the metadata • SI units are recommended • Choose a format for each parameter, explain the format in the metadata, and use that format throughout the file • e.g., use yyyymmdd; January 2, 1999 is 19990102
4. Use consistent data organization (one good approach) Note: -9999 is a missing value code
4. Use consistent data organization (a second good approach)
5. Perform basic quality assurance • Assure that data are delimited and line up in proper columns • Check that there no missing values for key parameters • Scan for impossible and anomalous values • Perform and review statistical summaries • Map location data (lat/long) and assess errors • Verify automated data transfers • e.g. check-sum techniques • For manual data transfers, consider double keying data and comparing 2 data sets
6. Assign descriptive data set titles • Data set titles should ideally describe the type of data, time period, location, and instruments used (e.g., Landsat 7). • Data set title should be similar to names of data files • Good: “Shrub Net Primary Productivity at the Sevilleta LTER, New Mexico, 2000-2001” • Bad: “Productivity Data”
Ecoinformatics solutions • Data design • Data acquisition • QA/QC • Data documentation (metadata) • Data archival
High-quality data depend on: • Proficiency of the data collector(s) • Instrument precision and accuracy • Consistency (e.g., standard methods and approaches) • Design and ease of data entry • Sound QA/QC • Comprehensive metadata (e.g., documentation of anomalies, etc.)
What’s wrong with this data sheet? PlantLife Stage ______________ _______________ ______________ _______________ ______________ _______________ ______________ _______________ ______________ _______________ ______________ _______________ ______________ _______________ ______________ _______________
Important questions • How well does the data sheet reflect the data set design? • How well does the data entry screen (if available) reflect the data sheet?
PHENOLOGY DATA SHEET Rio Salado - Transect 1 Collectors:_________________________________ Date:___________________ Time:_________ Notes:_________________________________________ _______________________________________________ PlantLife Stage ardi P/G V B FL FR M S D NP arpu P/G V B FL FR M S D NP atca P/G V B FL FR M S D NP bamu P/G V B FL FR M S D NP zigr P/G V B FL FR M S D NP P/G V B FL FR M S D NP P/G V B FL FR M S D NP P/G = perennating or germinating M = dispersing V = vegetating S = senescing B = budding D = dead FL = flowering NP = not present FR = fruiting
ardi P/G V B FL FR M S D NP asbr P/G V B FL FR M S D NP deob P/G V B FL FR M S D NP arpu P/G V B FL FR M S D NP Y N Y N Y N Y N Y N Y N Y N Y N YN Y N Y N YN Y N Y N YN Y N Y N Y N Y N Y N Y N Y N Y N Y N Y N Y N Y N Y N Y N Y N Y N Y N Y N Y N Y N YN PHENOLOGY DATA SHEET Rio Salado - Transect 1 Collectors Troy Maddux Date: 16 May 1991 Time: 13:12 Notes:Cloudy day, 3 gopher burrows on transect
Ecoinformatics solutions • Data design • Data acquisition • QA/QC • Data documentation (metadata) • Data archival
Experimental Design Methods Data Design Data Forms Field Computer Entry Electronically Interfaced Field Equipment Electronically Interfaced Lab Equipment Generic Data Processing Research Program Investigators Studies Quality Control Raw Data File Data Entry Quality Assurance Checks Data verified? no Data Contamination yes Summary Analyses Archive Data File Metadata Data Validated Archival Mass Storage Off-site Storage Magnetic Tape / Optical Disk / Printouts Investigators Access Interface Secondary Users Publication Synthesis Brunt 2000
Ecoinformatics solutions • Project / experimental design • Data design • Data acquisition • QA/QC • Data documentation (metadata) – to be addressed • Data archival
Ecoinformatics solutions • Project / experimental design • Data design • Data acquisition • QA/QC • Data documentation (metadata) • Data archival
Cycles of Research “A Conventional View” Publications Data Analysis and modeling Problem Collection Planning
Cycles of Research“A New View” Archive of Data Publications Analysis and modeling Selection and extraction Collection Original Observations Secondary Observations Problem Definition (Research Objectives) Planning Planning
Data Archive • A collection of data sets, usually electronic, stored in such a way that a variety of users can locate, acquire, understand and use the data. • Examples: • ESA’s Ecological Archive • NASA’s DAACs (Distributed Active Archive Centers)