1 / 27

Giri Palanisamy Oak Ridge National Laboratory & Lorrie Apple Johnson U.S. Department of Energy October 16, 2013

Data Citation and Linking of Big and Continuous Data An Experience from the U.S. Department of Energy’s Atmospheric Radiation Measurement (ARM) Program. Giri Palanisamy Oak Ridge National Laboratory & Lorrie Apple Johnson U.S. Department of Energy October 16, 2013.

boyce
Download Presentation

Giri Palanisamy Oak Ridge National Laboratory & Lorrie Apple Johnson U.S. Department of Energy October 16, 2013

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Citation and Linking of Big and Continuous Data An Experience from the U.S. Department of Energy’s Atmospheric Radiation Measurement (ARM) Program GiriPalanisamy Oak Ridge National Laboratory & Lorrie Apple Johnson U.S. Department of Energy October 16, 2013

  2. U.S. Department of Energy’s Atmospheric Radiation Measurement (ARM) Data Center • Located at Oak Ridge National Laboratory (ORNL) • Part of Climate Change Science Institute • ARM – www.arm.gov

  3. Office of Scientific and Technical Information (OSTI) OSTI has the corporate responsibility for ensuring appropriate access to the U.S. Department of Energy’s (DOE) R&D results. • DOE invests over $10 billion/year in basic sciences, clean energy technology, nuclear research. • The immediate output from this investment is information… knowledge… R&D results in many formats, including digital data. • OSTI’s mission is to accelerate scientific progress by accelerating access to this information. Energy Policy Act of 2005 “The Secretary, through the Office of Scientific and Technical Information, shall maintain within the Department publicly available collections of scientific and technical information resulting from research, development, demonstration, and commercial applications activities supported by the Department.”

  4. Type of Data – Atmospheric processes, cloud dynamics • Products - > 3,000 • Archive Size - > 300 TB • Users/year - ~ 1,500 • Year Started - 1991

  5. ARM data collection: Consists of permanent, mobile, and aircraft sites • Southern Great Plains (1993) • North Slope of Alaska: Barrow (1998) and Atqasuk (1999) • Tropical Western Pacific: Manus (1996), Nauru (1998), and Darwin (2002) • First ARM Mobile Facility (2005); Second ARM Mobile Facility (2010) • ARM Aerial Facility (2007)

  6. Challenges for Scientific Data • Hard to FIND • Hard to NAVIGATE • Hard to CITE

  7. ARM Archive - Challenges • Millions of data files from over 3,000 data products. • Most of them are continuous data streams. • Large user community and complex use of data (climate change modeling). • Data is also published via other portals.

  8. Data citation can help by: enabling easy reuse and verification of data allowing the impact of data to be tracked creating a scholarly structure that recognizes and rewards data producers Why Cite Data? • Data should be cited in just the same way that other sources of information, such as articles and books, are cited.

  9. ARM Data Citation Service - Goals • To allow users to cite the exact ARM data used in their research publications • To allow future data users, and the project, to easily track the data used in various articles • Strategy: • DOI’s assigned at the ARM data product level, and presented in the ARM data stream pages and field campaign readme files • DOI’s also sent via Archive data notification emails

  10. One Solution: DataCite What is DataCite? • A global consortium composed of local institutions focused on improving the scholarly infrastructure around datasets and other non-textual information. • A service for assigning Digital Object Identification (DOIs) and metadata to datasets. DataCite (www.datacite.org) helps researchers find, access and reuse data.

  11. DOE Data ID Service • DOE/OSTI is the only U.S. federal member of DataCite. • Interagency agreement in place with NIH project; in • discussions with eight agencies representing 15 projects. • OSTI Partnered with Oak Ridge National Laboratory to pioneer procedure. • First DOI for a DOE dataset was minted and registered with DataCite • on 8/10/2011. • DOE Atmospheric Radiation Measurement (ARM) has now registered over 545 datastreams, each representing hundreds of subordinate data files. • Currently working with 6 DOE data centers, including ARM. Two are fully integrated; 4 others in testing or planning phases.

  12. Improving Access, Citation & Reuse of Data • Easier identification and access of datasets across the international community of researchers via DataCite’s resolving tools • Linkage between DOE’s R&D documents and the underlying datasets generated by the research • Standard format for including data in the accepted bibliographic citation framework • Aid researchers in locating exact datasets used in previous work, thus allowing verification of results or new uses for the data

  13. How Data Citation Works WebServiceAPI 241.6AN Data Citation submitted to search enginesfor indexing Creator/Author, Primary Investigator, or Submitter notified of Data Citation availability DOI Assigned ByDOE-OSTI DOE-OSTI updates metadata record with DOI creating a full Data Citation DOE-OSTI submits nightly feed of newDOIs to DataCite DataCite validates DOI registration with DOE-OSTI DataCite Registers DOI • Originating Research Organization • Publication/ Issue Date • Sponsoring Organization • URL where the Dataset is posted for access • Contact information • Dataset Type • Dataset Title • Dataset Creator/Author or Principal Investigator • Dataset Product Number • DOE Contract/Award Number Data Citation metadata submitted to DOE-OSTI =

  14. Required Metadata Elements • Originating Research Organization • Publication/ Issue Date • Sponsoring Organization • URL where the Dataset is posted for access • Contact information • Dataset Type • Dataset Title • Dataset Creator/Author or Principal Investigator • Dataset Product Number • DOE Contract/Award Number

  15. Facilitating Access to Scientific Data: Federated Searching Since science is not bound by agency, organization, or geography… • We integrate or aggregate multiple government R&D-related databases into single-search portals. • Innovative technology drills down to selected databases and websites in parallel, then presents ranked search results.

  16. Multilingual translations capability for 10 languages. More than 400 million pages of scientific and technical information, including: Text Multimedia Data WorldWideScience.org Enabling Access to Global R&D Results Research results from 70+ countries are searchable via single-query global science portal.

  17. Citing ARM Data Several citation formats are possible using DOI’s. ARM encourages users to include the following information when citing ARM data: • Author • Original publication date • Update period, if applicable (daily, monthly, etc.) • Dataset name • Dates used • Location (latitude/longitude, site name, and facility identifier) • Editor(s) or compiler(s) • Place of publication • Publisher • Date accessed • DOI

  18. Example of Scientific Impact ORNL DAAC: Data Products used in literature ORNL DAAC requests that data be cited in list of references; some authors “refer” to data in text or acknowledgements

  19. Thank you! GiriPalanisamy Oak Ridge National Laboratory palanisamyg@ornl.gov Lorrie Johnson U.S. Department of Energy Office of Scientific and Technical Information JohnsonL@osti.gov

More Related