1 / 47

knb.ecoinformatics seek.ecoinformatics

Ecological Informatics: Challenges and Benefits Presentation to ESA Visions Committee March 31, 2003. Mark Schildhauer, Ph.D. Director of Computing, NCEAS. http://knb.ecoinformatics.org http://seek.ecoinformatics.org. Research Team and Collaborators. PISCO LTER Network

starr
Download Presentation

knb.ecoinformatics seek.ecoinformatics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Ecological Informatics: Challenges and BenefitsPresentation to ESA Visions CommitteeMarch 31, 2003 Mark Schildhauer, Ph.D. Director of Computing, NCEAS http://knb.ecoinformatics.orghttp://seek.ecoinformatics.org

  2. Research Team and Collaborators • PISCO • LTER Network • San Diego Supercomputer Center • Arizona State University • University of Kansas • University of North Carolina • OBFS Network • UC NRS • Sandy Andelman • Chad Berkley • Matthew Brooke • John Harris • Dan Higgins • Matt Jones • Jim Reichman • Mark Schildhauer • Jing Tao

  3. What is Ecoinformatics? Data Acquisition Integration Storage, archiving Distributed Access Results

  4. Ecoinformatics • The Goal: to develop technology tools and services to enable more efficient acquisition, integration, and analysis of ecological data • Specific Challenges • An Approach to Technology Solutions (KNB) • Future Directions • a Science Environment for Ecological Knowledge, SEEK

  5. Status of Ecological Data • Highly dispersed • Different individuals, organizations, and locations • Extreme heterogeneity • in Form, Content, and Meaning • Lack of Documentation (metadata) • Lack of metadata overall • Many standards in use, many custom types • Implementations are not modular

  6. Data are Highly Dispersed… • Data are distributed among: • Independent researcher holdings • Research station collections • LTER Network (24 sites) • Org. of Biological Field Stations (160+ sites) • Univ. Cal Natural Reserve System (36 sites) • Agency databases • Museum databases

  7. Data are physically dispersed… Visitors to NCEAS Field Stations in North America

  8. Data are very heterogeneous… • Population survey • Experimental • Taxonomic survey • Behavioral • Meteorological • Oceanographic • Hydrology • … • Syntax (format) • Schema (organization) • Semantics (meaning/methods)

  9. Thematic heterogeneity due to Vast Scope of Ecology Biosphere Abiotic Biomes Communities Organisms Genes

  10. Classifying Data Heterogeneity • Syntax (format) • Schema (organization) • Semantics (knowledge/meaning/methods)

  11. Data Lacking in Documentation • Majority of ecological data undocumented • Lack information on syntax, structure and semantics of data • Impossible to understand data without contacting the original researchers; even then memoriescan fail, individuals retire or expire • Documentation conventions widely vary • Requires large time investment to understand each data set

  12. Summary of Technical Challenges • Because of: • Data dispersion • Data heterogeneity • Lack of documentation • Integration and synthesis are limited to a manual process • --difficult to scale integration efforts up to large numbers of data sets

  13. Solutions • Standardized measurements • Changes needed in culture, training • Technology development- metadata, data servers, desktop tools

  14. Ecoinformatics Research Objectives • Enhance access to ecological and environmental data • Promote data sharing & re-use • Enable national data discovery • Provide access to research stations’ data resources • Maintain local autonomy for data management • Synthesis and Analysis • Promote cross-cutting analysis • Taxonomic, Spatial, Temporal, Conceptual integration of data • Data preservation • Long term data description • Provide archiving capabilities

  15. Functional breakdown for Analysis • Data discovery • Data access • Data storage/archive • Data interpretation • Quality assessment • Data Conversion & Integration • Analysis & Modeling • Visualization

  16. KNB Development Projects(Knowledge Network for Biocomplexity) • Ecological Metadata Language (EML) • Prospective standard for ecological metadata • Metacat • A freely available database for storing metadata • Morpho • A freely available tool for creating metadata

  17. KNB Overview Metadata (EML) Data Client Server Morpho Morpho Metacat Web Browser Web Browser Metacat

  18. KNB Development Projects • Ecological Metadata Language (EML) • Metacat • Morpho

  19. Why the big buzz about Metadata • Metadata are the basis for the next generation of the Web: • “The Semantic Web is a web of data, in some ways like a global database… The driver for the Semantic Web is …metadata” --Tim Berners-Lee, father of the Web • Digital Library Community– “Era of Metadata 1998-200?” – Carol Mandel, Digital Librarian

  20. Central Role of Metadata • What are metadata? • Data documentation • Ownership, attribution, structure, contents, methods, quality, etc. • Critical for addressing data heterogeneity issues • Critical for developing extensible systems • Critical for long-term data preservation • Allows advanced services to be built

  21. Data – just numbers 072998 29.5 17.0 073098 29.7 6.1 073198 29.1 0

  22. Data + Metadata =numbers + context Date Temp (C) Precip. (mm) Obs. #1072998 29.5 17.0 Obs. #2 073098 29.7 6.1 Obs. #3 073198 29.1 0

  23. Data Integration  synthesis A B C

  24. Rules of Thumb (Michener 2000) • the more comprehensive the metadata, the greater the longevity (and value) of the data • structured metadata can greatly facilitate data discovery, encourage “best metadata practices” and support data and metadata use by others • metadata implementation takes time!!! • start implementing metadata for new data collection efforts and then prioritize “legacy” and ongoing data sets that are of greatest benefit to the broadest user community

  25. EML 2.0a formal ecological metadata specification • eml-resource -- Basic resource info • eml-dataset -- Data set info • eml-literature -- Citation info • eml-software -- Software info • eml-party -- People and Organizations • eml-entity -- Data entity (table) info • eml-attribute -- Attribute (variable) info • eml-constraint -- Integrity constraints • eml-physical -- Physical format info • eml-access -- Access control • eml-distribution -- Distribution info • eml-project -- Research project info • eml-coverage -- Geographic, temporal and taxonomic coverage • eml-protocol -- Methods and QA/QC

  26. KNB Development Projects • Ecological Metadata Language (EML) • Metacat • Morpho

  27. Metacat – metadata storage • Metadata storage, search, presentation • Schema independent – supports arbitrary XML types • Multiple metadata standards • Ecological Metadata Language • NBII Biological Data Profile • Data storage + preservation • Replication • Flexible access control system • National distributed directory service • Strong version control • Configurable web interface (XSLT)

  28. Metacat network SEV NRS Metacat OBFS AND SEV Metacat NCEAS Metacat CAP LTER Metacat Key Metacat Catalog Morpho clients Web clients SDSC Metacat Site metadata system XML output filter

  29. Web interface

  30. KNB Development Projects • Ecological Metadata Language (EML) • Metacat • Morpho

  31. Morpho – Window to the KNB

  32. Morpho Features • Guided Metadata creation • Wizards & editor • Automatically extract metadata during data import • Search all metadata – structured + free text • Contribute to KNB • Windows, Mac, Linux • Multiple metadata standards • EML • NBII Biological Data Profile • Extensible • Standalone (non-networked) mode

  33. Objectives of the KNB & SEEK • National network for ecological data • Data discovery • Data access • Data interpretation • Enable advanced services • Quality management • Data integration thru advanced queries • Visualization and analysis

  34. Solutions • KNB • Ecological Metadata Language (EML) • Metacat -- flexible metadata database • Morpho -- data management for ecologists • SEEK (partners include NCEAS, KU, SDSC, LTER Netw Offc, CAP, Napier Univ., UVM, UNC) • Unified Portal to Ecological Data (ECOGRID) • Quality Assurance engine • Semantic Query Processor • Data integration and Analytical Pipelines

  35. SEEK – addressing semantic integration Ontologies EcoGrid One-stop access to ecological and environmental data Semantic Mediation Data integration using logic-based reasoning Science Environment for Ecological Knowledge Analysis and Modeling Pipelines Analysis workflows using semantic mediation

  36. Quality Assessment • Integrity constraint checking • Data type checking • Metadata completeness • Data entry errors • Outlier detection • Check assertions about data • e.g., trees don’t shrink • e.g., sea urchins do

  37. Semantic metadata • Describes the relationship between measurements and ecologically relevant concepts • Drawn from a controlled vocabulary • Ontology for ecological measurements

  38. Representing ontologies • OWL –Web Ontology Language • CKML – Conceptual Knowledge Markup Language • RDF – Resource Description Framework

  39. Ecological Ontologies

  40. Semantic Data Discovery • Knowledge of SQL or database languages is a barrier to data access and re-use SELECT dsname FROM dslist WHERE meas_type LIKE ‘pop_den’ AND location = ‘GBNPP’ AND common_name = ‘barnacles’; • Semantic Queries: allow scientists to express data queries in familiar scientific terms What data sets contain population density estimates for barnacles in Glacier Bay National Park and Preserve? • Functionality enabled through semantic metadata

  41. Data Integration Data Semantic Metadata Researcher Decisions + + + Integrated Data Set

  42. Re-using data from the KNB • Goal – support visualization & analysis • Scalability-- • Efficiently process more data from investigators • Broader Spatial extent, longer temporal extent, robust taxonomic extent • Analytical Pipelines (Monarch prototype) • Flexible tool for exploratory analysis of data • Directly process data in the network • Utilize powerful analytical environments (SAS, Matlab, R, …) • Analysis audit trail • Reproduce analyses • Communicate about analyses • Automate new analyses based on earlier ones

  43. Analysis Step Analysis Step Analysis Step Analysis Step Analysis Step Analysis Step Analysis Step Analysis Step Description And Code Description And Code Description And Code Description And Code Description And Code Description And Code Description And Code Description And Code Inputs Inputs Inputs Inputs Inputs Inputs Inputs Inputs Outputs Outputs Outputs Outputs Outputs Outputs Outputs Outputs Analysis Pipelines Runtime Data Binding

  44. Scaling Analysis and Modeling

  45. Data Acquisition (Jalama prototype) • Application to assist in data collection • Capture relevant metadata (e.g., EML) during initial data collection • Encourage good informatics practice via automating design of field data forms • Integration with Metadata and Data storage frameworks (e.g., Metacat)

  46. Ecoinformatics Solutions! Integration: MORPHO Data Acquisition: JALAMA Storage, archiving: ECOGRID Distributed Access: METACAT Analysis & Viz: MONARCH

  47. Fin http://knb.ecoinformatics.org

More Related