190 likes | 288 Views
Data Management for CENS. Sta s a Milojevi c Information Studies UCLA. CENS Data. CENS will generate massive amounts of heterogeneous scientific and technical data from the sensors. The data need to be useful for CENS researchers Real time Archived
E N D
Data Management for CENS Stasa Milojevic Information Studies UCLA
CENS Data • CENS will generate massive amounts of heterogeneous scientific and technical data from the sensors. • The data need to be useful for CENS researchers • Real time • Archived • The data also need to be useful for other researchers in those problem domains (larger community).
Data Management: Goals • -<dataset> • <alternateIdentifier>PLT-GCEM-0311b.1.0</alternateIdentifier> • <title>Fall 2003 plant monitoring survey -- biomass calculated from shoot height and flowering status of plants in permanent plots at GCE sampling sites 1-10</title> • -<creator> • <organizationName>Georgia Coastal Ecosystems LTER Project</organizationName> • -<address> • <deliveryPoint>Dept. of Marine Sciences</deliveryPoint> • <deliveryPoint>University of Georgia</deliveryPoint> • <city>Athens</city> • <administrativeArea>Georgia</administrativeArea> • <postalCode>30602-3636</postalCode> • <country>USA</country> • </address> Data Metadata Share with community
How to make data useful and usable? • One data model for all of CENS • Not likely, that presumes that all science problems are the same • One data model for each CENS research area • More promising approach • Various scientific communities have agreed on the common models
Seismology • Seismic data has been collected via digital instruments for over 30 years. • There are robust and stable standards for describing seismic data across systems and data formats (SEED – Standard for the Exchange of Earthquake Data) • Consortia to centralize and disseminate seismic datasets • IRIS (Incorporated Research Institutions for Seismology) • NEES (Network for Earthquake Engineering Simulation)
Habitat Monitoring Habitat monitoring research: • Draws upon multiple disciplines and technologies • Integrates data across a wide range of ecological scales (chemistry, physiology, ecology, and environment) • Available testbeds include: embedded microclimate sensor network and embedded phenology network (including wildlife and plant monitoring) • Habitat monitoring data: • Temperature, moisture, and barometric pressure • Video data
James Reserve and habitat monitoring community Why we started with this community? • One of the initial CENS sensor deployments • The project is at an early stage of defining data and metadata requirements • Data from this project are being used as the basis for our initial inquiry learning research in CENS
Ecological Metadata Language (EML) • XML- based standard, developed by and for ecological community • Divided into modules such as eml-access, eml-attribute, eml-project • Describes data, literature, software, products • Not well optimized for sensor data • Optimized for describing data and not the derivation of data • Uses Morpho Client as a cross-platform for creating and organizing data and metadata, either locally or on a shared network server
Ecological Metadata Language (EML) -<coverage> -<geographicCoverage> <geographicDescription>GCE Study Site GCE1 -- Eulonia, Georgia, USA. Transitional salt marsh/upland forest site at the upper reach of the Sapelo River near Eulonia, Georgia. The main marsh area is to the north of the channel where the upland is controlled by DNR. Several small creeks lie within the study area. Residential development is increasing on the upland areas south of the channel. A hydrographic sonde is deployed within this site attached to a private dock to the south of the main channel near the HW-17 bridge.</geographicDescription> -<boundingCoordinates> <westBoundingCoordinate>-81.427321</westBoundingCoordinate> <eastBoundingCoordinate>-81.410390</eastBoundingCoordinate> <northBoundingCoordinate>31.546173</northBoundingCoordinate> <southBoundingCoordinate>31.535095</southBoundingCoordinate> </boundingCoordinates> </geographicCoverage>
Describing Instruments Sensor Model Language (SensorML) • Emerging OpenGIS standard for describing sensors and sensor data • Developed to support data discovery, data processing and geolocation • Can be used for in-situ or remote sensors, dynamic or static platforms • Optimized for large sensors and large platforms • Describes resources for sensor management and discoveries, but not sensor-derived data
Sensor Model Language (SensorML) Sensor Sensor identifiedAs identifiedAs documentConstrainedBy documentConstrainedBy attachedTo hasCRS locatedUsing measures operatedBy describedBy documentedBy
Science and Education • We need to make the science data useful for teaching grade 6-12 science. • Problem because the scientific models describe the data, and the education models describe lessons (grade level, instruments required for the lesson, time required to perform the lesson, educational standards, etc.)
METADATA FOR SENSOR DATA FOR HABITAT MONITORING Science and Education Data Models METADATA FOR EDUCATION MODULES FOR HABITAT MONITORING CENS Schema SensorML EML 2.0 LOM GEM ADN CENS_Node.Node_Name Name of Node Sml:IdentifiedAs (2.2.2) CENS_Node.Node_Desc Description of Node AssetDescription: sml:description (2.2.12) CENS_Location.Location_ID Unique location ID CrsID (2.2.5) Eml-Coverage(2.4.4) CENS_Location.X_Pos (Position on X axis) HasCRS (2.2.5) ObjectState (3.3.6) Eml-Coverage- GeographicCoverage (2.4.4) CENS_Location.Time_Recorded Time location was captured Eml-Coverage- TemporalCoverage (2.4.4) CENS_Location.Time_Type_ID Refers to type of time of Time_Type ID table Eml-Coverage (2.4.4) Educational-Typical Age Range (5.7) Audience-Age Audience Life Cycle-Contribute (2.3) Creator Resource Creator General-Coverage (1.6) Coverage-Spatial, Temporal Coverage (spatial and temporal) Life Cycle-Date (2.3.3) DateTime (8) Date Creation dateAccession date General-Description (1.4) Description Description Educational (5) Pedagogy Educational
Science and Education Data Models : Possible Solution • Manage scientific data with models appropriate to the scientific community • Construct filters and tools to make scientific data useful to K-12 students and teachers: • Reduce granularity of data (e.g. temperature at hourly, rather than minute intervals) • Develop tools to display these data (e.g. simple charts and graphs) • Describe filters and tools using models appropriate to educational community (e.g. LOM, SCORM, GEM)
Science and Education Data Models –Possible Solution Sets of Data collected run through Filters and Tools to produce understandable Tables, Charts and Graphs
Current accomplishments and next steps James Reserve: • Map current data structures to EML and SensorML to determine the fit • Analyze scientific papers and documents to determine required data elements • Create use scenarios • Interview scientists
Current accomplishments and next steps Education: • Work with inquiry module team to identify data requirements • Interview teachers
Discussion and Conclusions Ensuring accessibility and integrity of CENS data to multiple communities requires: • Understanding of the practices of each community • Understanding of relationships between those practices • Means to bridge the gaps
Acknowledgements Christine Borgman Andrew Wu Bill Sandoval Noel Enyedy Joe Wise Mike Wimbrow