370 likes | 521 Views
Data Sharing in IPY: Policy, Practice, and Services. Mark A. Parsons Co-Chair, IPY Data Policy and Management Subcommittee Manager, IPY Data and Information Service. Knowledge is power. – Francis Bacon.
E N D
Data Sharing in IPY: Policy, Practice, and Services Mark A. Parsons Co-Chair, IPY Data Policy and Management Subcommittee Manager, IPY Data and Information Service
The restriction of knowledge to an elite group destroys the spirit of society and leads to its intellectual impoverishment.– Albert Einstein
World Meteorological Organization (WMO) WMO World Data Centers Intergovernmental Oceanographic Commission (IOC) of UNESCO World Climate Program Committee on Earth Observations Satellites (CEOS) International Earth Observing System (IEOS) Houston Economic Summit of the Group of Seven Most Industrialized Nations Organization for Economic Co-operation and Development (OECD) Inter-American Institute for Global Change Research Agenda 21, UN Conference on the Environment and Development (UNCED) Framework Convention on Climate Change International Council for Science (ICSU) ICSU World Data Centers (WDC) International Geosphere-Biosphere Program (IGBP) Global Climate Observing System (GCOS) Second World Climate Conference (SWCC) Scientific Committee on Solar-Terrestrial Physics (SCOSTEP) International Social Science Council (ISSC) International Union of Radio Science (URSI) World Ocean Circulation Experiment (WOCE) International Polar Year (IPY) Global Earth Observing System of Systems (GEOSS) etc.etc.etc. International Data Policies
Some IPY Objectives • IPY has an interdisciplinary emphasis, with active inclusion of the social sciences. • IPY will link researchers across different fields to address questions and issues lying beyond the scope of individual disciplines. • IPY will strengthen international coordination of research and enhance international collaboration and cooperation • IPY will leave a legacy of observing sites, facilities and networks, as well as individual data and data systems to support ongoing polar research and monitoring.
IPY Data Policy—What are IPY Data? http://www.ipy.org/Subcommittees/final_ipy_data_policy.pdf • Special Cases: • Human subjects • Intellectual property of LTK • Where data release may cause harm Data generated by IPY Data used by IPY
Data The IPY Joint Committee requires that IPY data, including operational data delivered in real time, are made available fully, freely, openly, and on the shortest feasible timescale.The only exceptions to this policy of full, free, and open access are: • where human subjects are involved, confidentiality must be protected • where local and traditional knowledge is concerned, rights of the knowledge holders shall not be compromised • where data release may cause harm, specific aspects of the data may need to be kept protected —IPY Data Policy IPY will set a new standard in scientific cooperation as rapid and unrestricted data exchange becomes an accepted and enabling factor in daily research. —IPY Science Plan
Metadata All IPY data must be accompanied by a full set of metadata that completely document and describe the data. In accordance with the ISO standard Reference Model for an Open Archival Information System (OAIS) (CCSDS 2002), complete metadata may be defined as all the information necessary for data to be independently understood by users and to ensure proper stewardship of the data. Regardless of any data access restrictions or delays in delivery of the data itself, all IPY projects must promptly provide basic descriptive metadata of collected data in an internationally recognized, standard format to an appropriate catalog or registry. —IPY Data Policy
IPY Metadata Profile (and crosswalk) • “All data registries and repositories collecting data and metadata from IPY projects are required to collect and share sufficient information to adhere to the IPY Metadata Profile” • Basic who, what, where, when in either FGDC, DIF, THREDDS (ISO coming, but could use some help), plus some information on metadata provenance. • Controlled vocabulary from GCMD for some fields. • The “bare minimum of information necessary to allow simple discovery across disciplines and to ensure we can track the heritage of the metadata in a broadly distributed data management environment.” • Details available at ipydis.org
Attribution ..users of IPY data must formally acknowledge data authors (contributors) and sources. Where possible, this acknowledgment should take the form of a formal citation, such as when citing a book or journal article. Journals should require the formal citation of data used in articles they publish.
Preservation All IPY data must be archived in their simplest, useful form and be accompanied by a complete metadata description. An IPY Data and Information Service (IPYDIS) should help projects identify appropriate long-term archives and data centers, but it is the responsibility of individual IPY projects to make arrangements with long-term archives to ensure the preservation of their data. It must be recognized that data preservation and access should not be afterthoughts and need to be considered while data collection plans are developed.
A striking proportion of project difficulties stem from people in both customer and supplier organisations failing to implement known best practice.– Oxford University/Computer Weekly survey of public and private sector IT projects
Science and Data Management • Many have stated the need to involve scientists in data management, but… • It is also important to involve data managers in conducting science. • Field Experiments: • ~20% increase in data quality (Parsons, et al. 2004) • 70% of experiment cost is data collection (Longley, et al. 2001) • Observing systems • Define/clarify roles for data centers and investigators • QC (from file verification to scientific assessment) • Metadata and documentation development • Formatting, gridding, packaging (e.g. sharing protocols)
Register basic discovery metadata in a portal Use existing standards, e.g. FGDC metadata standard OAIS Reference Model Develop “Data Stories” Describe uncertainty Challenge your assumptions Documentation “We must not … start from any and every accepted opinion, but only from those we have defined — those accepted by our judges or by those whose authority they recognize.” —Aristotle c. 350 BC
01100010100100111101011100011110110010101000111001110010101001110101010011100011010110100001000010010100100101011001001000101010010010010101010100101010010100101010000011111001011010101011010001011110101101011010101001100010100100111101011100011110110010101000111001110010101001110101010011100011010110100001000010010100100101011001001000101010010010010101010100101010010100101010000011111001011010101011010001011110110110001010010011110101110001111011001010100011100111001010100111010101001110001101011010000100001001010010010101100100100010101001001001010101010010101001010010101000001111100101101010101101000101111010110101101010100110001010010011110101110001111011001010100011100111001010100111010101001110001101011010000100001001010010010101100100100010101001001001010101010010101001010010101000001111100101101010101101000101111011 The Data Formats: • Negotiate common formats and conventions • ASCII is useful but not really a precise format • avoid proprietary formats • some suggestions: netCDF is popular for some, OGC (WMS/WCS/WFS) compatibility is nice • Archives and users may have different needs Access: • Integrate with many systems to allow increased user discovery (register with the IPYDIS) • Use open source software when possible, use open standards everywhere. Preservation • Open Archive Information System Reference Model • Attribute and provide info for attribution readily through all gateways
Other practices to consider • Annual meetings/sessions focused on data and initial scientific results • Reports made available online, e.g., addressing data as part of a general annual project progress report • “Updates from field” (e.g., tied to education & outreach) • Search/request facility for data in process (e.g., field catalog), to facilitate cross-disciplinary data discovery • Establish data tracking systems for projects, disciplines, countries • Build data sharing and data integration partnerships and then extend (I told two friends and they told two friends and …)
ELOKAThe Exchange for Local Observations and Knowledge of the Arctic works to provide data management and user support to facilitate the collection, preservation, exchange, and use of local observations and knowledge of the Arctic. http://nsidc.org/eloka PI: Shari Gearheard
Assist on compliance with standards, identification of archives, development of the union catalogue, and other data management requirements for IPY. Visibly track the data flow for IPY. In collaboration with the IPO, develop a data registry that will continue throughout the IPY. Survey the planned projects and the data they intend to collect and identify existing archives, portals, experts, and significant gaps in the IPY data infrastructure. Mark Parsons—Overall, US Øystein Godøy—Operational Data, Norway Canadian Coordinator—Overall, Canada National coordinators in Netherlands, China, UK Data Coordinators
Develops data policy Develops data strategy Determines data flow structure (consideration of procedures, real-time requirements, transmission and archival) Advise JC Requirements and recommend actions for IPYDIS--what do we need? Mark Parsons, USA; co-chair Taco de Bruin, Netherlands; co-chair Nathan Bindoff, Australia (represented by Kim Finney) Joan Eamer, Norway/UNEP Eberhard Fahrbach, Germany; JC liaison Hannes Grobe, Germany Ray Harris, UK/GEOSS Ellsworth LeDrew, Canada Xin Li, China Håkan Olsson, Sweden Alexander Sterin, Russia/WMO Vladimir Papitashvilli, USA/eGY Birger Poppel, Greenland Data Committee
catalog 5 catalog 6 (Mirror) catalog 2 catalog 1 catalog 9 catalog 4 catalog 3 (Mirror) catalog 7 catalog 8 Social Network The “Union” Catalog courtesy P. Pulsifer
We want your feedback! http://ipydis.org | ipydis@ipydis.org Mark A. Parsonsparsonsm@nsidc.org Ellsworth LeDrewells@watleo.uwaterloo.ca Taco de Bruinbruin@nioz.nl
Entry ID (controlled) Data set title Data set progress Data set summary Data set citation information including Online Resource Parameters Locations ISO topic categories Temporal coverage Spatial coverage Data center contact information Access restrictions Use constraints Data Set Language Metadata contact information Metadata authority Metadata version Last revision IPY flag IPY Project ID IPY metadata profile elements