190 likes | 324 Views
Geo-Data Informatics (GDI) Workshop: Exploring the Life Cycle, Citation and Integration of Geo-Data. Summary Report from Thursday, 3 March 2011 Pine Room Data Integration Breakout Group. Discussion Prompt.
E N D
Geo-Data Informatics (GDI) Workshop: Exploring the Life Cycle, Citation and Integration of Geo-Data Summary Report fromThursday, 3 March 2011 Pine Room Data Integration Breakout Group
Discussion Prompt In your view/experience what parts of data integration implementations/applications or frameworks are well established (or not) in your discipline(s) and what are the common gaps? Moderator: Cyndy Chandler (WHOI, BCO-DMO) Rapporteur: Chris Mattmann (NASA JPL, USC) Discussion notes kept at TWC hosted titanpad site
Participants • Bob Arko (Lamont-Doherty Earth Observatory) • Joanne Luciano (TWC, RPI) • Anna Milan (National Geophysical Data Center) • Bob Simons (NOAA) • Brian Wee (NEON, Inc.) • Leslie Hsu (LDEO) • Roland Viger (USGS) • James Wilson (James Madison University) • Tom Narock (NASA/GSFC) • Cathy Constable (SIO, UCSD) • Ruth Duerr (NSIDC) • YooriChoi (CUAHSI) • Lee Allison, Arizona Geological Survey • Erin Robinson (ESIP) • KavithaChandrasekar, Indiana University • Bob Detrick (NSF) • Clifford Jacobs (NSF) • Leonard Jonson (NSF)
Data Integration • What does that mean? Combining more than one data source into a single data object. Different from display of multiple data sources in a single view. Example: a database join Time series data sets made up of a variety of sources of data often require data integration. Data aggregation and interoperability are related concepts. Group did not come to consensus.
Geo Disciplines Represented • Geology • Hydrology • Oceanography • Geophysics • Geography • Marine geology and geophysics • Space science • Air quality • Computational neuroscience • Multi-disciplinary or discipline-agnostic: data management, computer science and archive
Geo-Data Integration • What aspects are well established or not? • Identify common gaps?
For many projects, two common themes emerged as being associated with some level of success in ability to do data integration: • ‘long-term’ commitment of funding support • Active engagement of funding managers Examples: Unidata (Atmospheric Sciences) CUASHI (Hydrography) IRIS (Earthquake) US JGOFS, US GLOBEC, US WOCE (Ocean Sciences) ODP (Ocean Drilling)NEON
Support for Data Integration Development of community of practice • Infrastructure to foster communication (workshops) • Mentoring of students and early career PIs • Development of tools (e.g. Unidata developed NetCDF which has been adopted by many communities) • Education and training • The persistence and recognition of a ‘named’ community can enable funds to flow from some agencies to researchers
Support for Data Integration • Some communities agreed on common data formats that facilitated data integration • Pressures from funding agencies or community needs resulted in common software tools • Some communities identified ‘primary’ or ‘core’ variables (e.g. common, essential measurements)
Summary • ‘Long-term’ funding support enables development of a community-of-practice that fosters communication, education and training, development and adoption of common tools and identification of core measurements. Communities-of-Practice can divide up the labor and work collaboratively to address shared challenges (economy of scale).
Additional Observations • Tension between local and global (single PI to coordinated project to national to international). An awareness of global use of data could help with subsequent data integration. • Early planning/specs for data management are important but traditionally difficult to obtain funding.
Gaps • Lack of awareness/understanding that keeping data ‘alive’ (usable) is not free • Many people think data stewardship and data preservation are "solved problems” (not). • "bit level preservation" has been solved, but what is the useful lifespan of those files? What effort is required to make the archived data compatible with all the latest tools and technology. Ability to use a dataset declines over time, without continuing and ongoing attention to ensure that it's still meeting the current access requirements.
Gaps • Historical or legacy data (originating PI is no longer active in the research community) • no national policy for scientific preservation • different disciplines have different interpretations of features in a dataset • Lack of guidelines for best practices regarding metadata required to document model results* software, methodology, inputs, outputs, etc
Gaps • Misconception that you create metadata one time, and it's forever good • not a true statement • somehow the metadata needs to be updated • systems and the infrastructure need to support this • metadata needs to evolve over time
Suggestion Group agreed that ESIP would be an appropriate community in which to continue these discussions and start to do some much needed planning and cross-disciplinary solutions needed to address the gaps and improve infrastructure for geo-data integration.
Additional Comments • NRC study done 7-8 years ago about the loss of data and samples in the geosciences: http://www.nap.edu/openbook.php?record_id=10348&page=R1 • Geoscience Data and Collections: NATIONAL RESOURCES IN PERIL
Additional Comments • Marine Metadata Interoperability (MMI)http://marinemetadata.org/Collection of ‘Guides’ on topics including Semantic Web technologies, controlled vocabularies, ontologies, standards, metadata best practices, and much more. • MMI Ontology Registry and Repository (ORR) is a web application through which you can create, update, access, and map ontologies and their terms. http://mmisw.org/orr/#b
Additional • CUASHI: Hydrologic Ontology System (funded by NSF) http://his.cuahsi.org/ontologyfiles.html http://water.sdsc.edu/hiscentral/startree.aspx • "Data Management Plan" template available from CUAHSI (February 2011). It is available at http://www.cuahsi.org/his-dmp.html; and includes data inventory, data and metadata standards, data management life cycle, etc.
Additional Comments • EXILIR http://www.bbsrc.ac.uk/science/international/elixir.aspx European life science infrastructure for biological information. • Its Mission: To construct and operate a sustainable infrastructure for biological information in Europe to support life science research and its translation to medicine and the environment, the bio-industries and society.