270 likes | 442 Views
Data Citation and Linking of Big and Continuous Data An Experience from the U.S. Department of Energy’s Atmospheric Radiation Measurement (ARM) Program. Giri Palanisamy Oak Ridge National Laboratory & Lorrie Apple Johnson U.S. Department of Energy October 16, 2013.
E N D
Data Citation and Linking of Big and Continuous Data An Experience from the U.S. Department of Energy’s Atmospheric Radiation Measurement (ARM) Program GiriPalanisamy Oak Ridge National Laboratory & Lorrie Apple Johnson U.S. Department of Energy October 16, 2013
U.S. Department of Energy’s Atmospheric Radiation Measurement (ARM) Data Center • Located at Oak Ridge National Laboratory (ORNL) • Part of Climate Change Science Institute • ARM – www.arm.gov
Office of Scientific and Technical Information (OSTI) OSTI has the corporate responsibility for ensuring appropriate access to the U.S. Department of Energy’s (DOE) R&D results. • DOE invests over $10 billion/year in basic sciences, clean energy technology, nuclear research. • The immediate output from this investment is information… knowledge… R&D results in many formats, including digital data. • OSTI’s mission is to accelerate scientific progress by accelerating access to this information. Energy Policy Act of 2005 “The Secretary, through the Office of Scientific and Technical Information, shall maintain within the Department publicly available collections of scientific and technical information resulting from research, development, demonstration, and commercial applications activities supported by the Department.”
Type of Data – Atmospheric processes, cloud dynamics • Products - > 3,000 • Archive Size - > 300 TB • Users/year - ~ 1,500 • Year Started - 1991
ARM data collection: Consists of permanent, mobile, and aircraft sites • Southern Great Plains (1993) • North Slope of Alaska: Barrow (1998) and Atqasuk (1999) • Tropical Western Pacific: Manus (1996), Nauru (1998), and Darwin (2002) • First ARM Mobile Facility (2005); Second ARM Mobile Facility (2010) • ARM Aerial Facility (2007)
Challenges for Scientific Data • Hard to FIND • Hard to NAVIGATE • Hard to CITE
ARM Archive - Challenges • Millions of data files from over 3,000 data products. • Most of them are continuous data streams. • Large user community and complex use of data (climate change modeling). • Data is also published via other portals.
Data citation can help by: enabling easy reuse and verification of data allowing the impact of data to be tracked creating a scholarly structure that recognizes and rewards data producers Why Cite Data? • Data should be cited in just the same way that other sources of information, such as articles and books, are cited.
ARM Data Citation Service - Goals • To allow users to cite the exact ARM data used in their research publications • To allow future data users, and the project, to easily track the data used in various articles • Strategy: • DOI’s assigned at the ARM data product level, and presented in the ARM data stream pages and field campaign readme files • DOI’s also sent via Archive data notification emails
One Solution: DataCite What is DataCite? • A global consortium composed of local institutions focused on improving the scholarly infrastructure around datasets and other non-textual information. • A service for assigning Digital Object Identification (DOIs) and metadata to datasets. DataCite (www.datacite.org) helps researchers find, access and reuse data.
DOE Data ID Service • DOE/OSTI is the only U.S. federal member of DataCite. • Interagency agreement in place with NIH project; in • discussions with eight agencies representing 15 projects. • OSTI Partnered with Oak Ridge National Laboratory to pioneer procedure. • First DOI for a DOE dataset was minted and registered with DataCite • on 8/10/2011. • DOE Atmospheric Radiation Measurement (ARM) has now registered over 545 datastreams, each representing hundreds of subordinate data files. • Currently working with 6 DOE data centers, including ARM. Two are fully integrated; 4 others in testing or planning phases.
Improving Access, Citation & Reuse of Data • Easier identification and access of datasets across the international community of researchers via DataCite’s resolving tools • Linkage between DOE’s R&D documents and the underlying datasets generated by the research • Standard format for including data in the accepted bibliographic citation framework • Aid researchers in locating exact datasets used in previous work, thus allowing verification of results or new uses for the data
How Data Citation Works WebServiceAPI 241.6AN Data Citation submitted to search enginesfor indexing Creator/Author, Primary Investigator, or Submitter notified of Data Citation availability DOI Assigned ByDOE-OSTI DOE-OSTI updates metadata record with DOI creating a full Data Citation DOE-OSTI submits nightly feed of newDOIs to DataCite DataCite validates DOI registration with DOE-OSTI DataCite Registers DOI • Originating Research Organization • Publication/ Issue Date • Sponsoring Organization • URL where the Dataset is posted for access • Contact information • Dataset Type • Dataset Title • Dataset Creator/Author or Principal Investigator • Dataset Product Number • DOE Contract/Award Number Data Citation metadata submitted to DOE-OSTI =
Required Metadata Elements • Originating Research Organization • Publication/ Issue Date • Sponsoring Organization • URL where the Dataset is posted for access • Contact information • Dataset Type • Dataset Title • Dataset Creator/Author or Principal Investigator • Dataset Product Number • DOE Contract/Award Number
Facilitating Access to Scientific Data: Federated Searching Since science is not bound by agency, organization, or geography… • We integrate or aggregate multiple government R&D-related databases into single-search portals. • Innovative technology drills down to selected databases and websites in parallel, then presents ranked search results.
Multilingual translations capability for 10 languages. More than 400 million pages of scientific and technical information, including: Text Multimedia Data WorldWideScience.org Enabling Access to Global R&D Results Research results from 70+ countries are searchable via single-query global science portal.
Citing ARM Data Several citation formats are possible using DOI’s. ARM encourages users to include the following information when citing ARM data: • Author • Original publication date • Update period, if applicable (daily, monthly, etc.) • Dataset name • Dates used • Location (latitude/longitude, site name, and facility identifier) • Editor(s) or compiler(s) • Place of publication • Publisher • Date accessed • DOI
Example of Scientific Impact ORNL DAAC: Data Products used in literature ORNL DAAC requests that data be cited in list of references; some authors “refer” to data in text or acknowledgements
Thank you! GiriPalanisamy Oak Ridge National Laboratory palanisamyg@ornl.gov Lorrie Johnson U.S. Department of Energy Office of Scientific and Technical Information JohnsonL@osti.gov