1 / 86

Introduction to Data Citation

Monica Duke Digital Curation Centre M.Duke@ukoln.ac.uk. Introduction to Data Citation. ... because good research needs good data. http://www.dcc.ac.uk /. Text. Alex Ball Digital Curation Centre A.Ball@ukoln.ac.uk. https://ipres.ischool.utoronto.ca/node/59ext. Funded by:.

heider
Download Presentation

Introduction to Data Citation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Monica Duke Digital Curation Centre M.Duke@ukoln.ac.uk Introduction to Data Citation ... because good research needs good data http://www.dcc.ac.uk/ Text Alex Ball Digital Curation Centre A.Ball@ukoln.ac.uk https://ipres.ischool.utoronto.ca/node/59ext Funded by: Except where otherwise stated, this work is licensed under a Creative Commons Attribution 2.5 UK: Scotland License. DCC Data Citation Tutorial, 2 October 2012 #ukdcc

  2. ... because good research needs good data Overview of the afternoon • Exercise metadata • Technologies • Infrastructure • Exercise identifier services • Community activity • Sources of information • How to support data citation • e • Discussion • Issues and questions Introductions • Reasons and requirements for data citation • Elements of data citation • Examples Comfort break Text DCC Data Citation Tutorial, 2 October 2012 #ukdcc

  3. ... because good research needs good data Introductions DCC Data Citation Tutorial, 2 October 2012 #ukdcc

  4. ... because good research needs good data The Digital Curation Centre (DCC) • a consortium comprising units from the Universities of Bath (UKOLN), Edinburgh (DCC Centre) and Glasgow (HATII) • launched 1st March 2004 as a national centre for solving challenges in digital curation that could not be tackled by any single institution or discipline • funded by JISC/additional HEFCE funding/project funding • Range of outputs • publications, events, website, helpdesk, tools ... Text DCC Data Citation Tutorial, 2 October 2012 #ukdcc

  5. ... because good research needs good data Monica Duke Digital Curation Centre M.Duke@ukoln.ac.uk Alex Ball Digital Curation Centre A.Ball@ukoln.ac.uk Text DCC Data Citation Tutorial, 2 October 2012 #ukdcc

  6. My colleague and co-author of this tutorial, Alex Ball, has been called to other duties. DCC Data Citation Tutorial, 2 October 2012 #ukdcc

  7. From http://a5.sphotos.ak.fbcdn.net/hphotos-ak-ash3/557059_461031593914561_1041760901_n.jpg

  8. From van de Sompel(2011) http://sites.nationalacademies.org/PGA/brdi/PGA_064019

  9. UKOLN, University of Bath

  10. ... because good research needs good data Data Citation DCC Data Citation Tutorial, 2 October 2012 #ukdcc

  11. ... because good research needs good data Overview of the afternoon • Exercise metadata • Technologies • Infrastructure • Exercise identifier services • Community activity • Sources of information • How to support data citation • e • Discussion • Issues and questions Introductions • Reasons and requirements for data citation • Elements of data citation • Examples Comfort break Text DCC Data Citation Tutorial, 2 October 2012 #ukdcc

  12. Slide: Christine Borgman based on http://www.guzer.com/pictures/suprise_suprise.jpg (via Ian Foster) DCC Data Citation Tutorial, 2 October 2012 #ukdcc

  13. ... because good research needs good data Identity Discoverability Social practice Usability Relationships Digital Object Infrastructure Provenance Intellectual property Persistence Policy Slide adapted from Borgman (2011) DCC Data Citation Tutorial, 2 October 2012 #ukdcc

  14. ... because good research needs good data Where is data in research? Text Data-centric phases of the research lifecycle DCC Data Citation Tutorial, 2 October 2012 #ukdcc

  15. ... because good research needs good data Why cite data? • To complete the scholarly knowledge cycle • datasets are first class records of research • judge the strength of conclusions, premises, methods • enable further investigation • permanent access • Assign credit • link contributor to contribution • contributions can be attributed • metrics can be computed • measure impact • translate into rewards Text DCC Data Citation Tutorial, 2 October 2012 #ukdcc

  16. ... because good research needs good data Data citations provide ... • Visibility for data • Protection from plagiarism • Possibility of verification of results • Data on which to base future research • Possibility for reward models • Access Text DCC Data Citation Tutorial, 2 October 2012 #ukdcc

  17. ... because good research needs good data In practice, a citation needs to…. • Identify a dataset • Identify subsets • Locate and access data • ..... preferably using web infrastructure • Provide (or lead to) contextual information • Support further services (metrics) • Cater for the human user and automated agent Text DCC Data Citation Tutorial, 2 October 2012 #ukdcc

  18. ... because good research needs good data Elements of data citation DCC Data Citation Tutorial, 2 October 2012 #ukdcc

  19. Ball (2012) examined four data citation styles described in….. • Altman, M, & King, G. (2007). A proposed standard for the scholarly citation of quantitative data. D-Lib Magazine. 13(3/4). doi:10.1045/march2007-altman • Lawrence, B.N., Jones.C.M., Matthews, B.M. & Pepler, S.J. (2008, February 1). Data publication (Claddier Project Report No. 3). BADC. Retrieved from http://purl.org/oai/oai:epubs.cclrc.ac.uk:work/43641 • Green, T (2010, February). We need publishing standards for datasets and data tables. OECD Publishing. doi:10.1787/787355886123 • Starr, J & Gastl, A. (2011). isCitedBy: A metadata schema for DataCite. D-Lib Magazine, 17(1/2). doi:1945/january2011-starr Data Citation styles: which elements do they use?

  20. Altman and King (2007): Dataverse Sidney Verba. 1998. “U.S. and Russian Social and Political Participation Data,” hdl:1902.4/00754 UNF:3:ZNQRI14053UZq389x0Bffg?== NORC [Producer]; data set [Type (DC)] ICPSR [Distributor]. Lawrence et al. (2008): BADC Iwi, A. and B. N. Lawrence (2004). A 500 year control run of HadCM3. [GridSeries, http://ndg.nerc.ac.uk/csml2/GridSeries] Version 1. BADC. urn:badc.nerc.ac.uk_coapec500yr [Available from http://badc.nerc.ac.uk/data/coapec500yr]. Green (2010): OECD OECD(2009), “Key short-term indicators”, Main Economic Indicators (database). doi: 10.1787/data-00039-en http://dx.doi.org/10.1787/data-00039-en (Accessed on 14 September 2009) Starr and Gastl (2011): DataCite Irino, T; Tada, R (2009): Chemical and mineral compositions of sediments from ODP Site 127-797. V.2. Geological Institute, University of Tokyo. Dataset. doi:10.1594/PANGAEA.726855.http://dx.doi.org/10.1594/PANGAEA.726855 Author

  21. Altman and King (2007): Dataverse Sidney Verba. 1998. “U.S. and Russian Social and Political Participation Data,” hdl:1902.4/00754 UNF:3:ZNQRI14053UZq389x0Bffg?== NORC [Producer]; data set [Type (DC)] ICPSR [Distributor]. Lawrence et al. (2008): BADC Iwi, A. and B. N. Lawrence (2004). A 500 year control run of HadCM3. [GridSeries, http://ndg.nerc.ac.uk/csml2/GridSeries] Version 1. BADC. urn:badc.nerc.ac.uk_coapec500yr [Available from http://badc.nerc.ac.uk/data/coapec500yr]. Green (2010): OECD OECD (2009), “Key short-term indicators”, Main Economic Indicators (database). doi: 10.1787/data-00039-en http://dx.doi.org/10.1787/data-00039-en (Accessed on 14 September 2009) Starr and Gastl (2011): DataCite Irino, T; Tada, R (2009): Chemical and mineral compositions of sediments from ODP Site 127-797. V.2. Geological Institute, University of Tokyo. Dataset. doi:10.1594/PANGAEA.726855.http://dx.doi.org/10.1594/PANGAEA.726855 Publication date

  22. Altman and King (2007): Dataverse Sidney Verba. 1998. “U.S. and Russian Social and Political Participation Data,” hdl:1902.4/00754 UNF:3:ZNQRI14053UZq389x0Bffg?== NORC [Producer]; data set [Type (DC)] ICPSR [Distributor]. Lawrence et al. (2008): BADC Iwi, A. and B. N. Lawrence (2004). A 500 year control run of HadCM3. [GridSeries, http://ndg.nerc.ac.uk/csml2/GridSeries] Version 1. BADC. urn:badc.nerc.ac.uk_coapec500yr [Available from http://badc.nerc.ac.uk/data/coapec500yr]. Green (2010): OECD OECD (2009), “Key short-term indicators”, Main Economic Indicators (database). doi: 10.1787/data-00039-en http://dx.doi.org/10.1787/data-00039-en (Accessed on 14 September 2009) Starr and Gastl (2011): DataCite Irino, T; Tada, R (2009): Chemical and mineral compositions of sediments from ODP Site 127-797. V.2. Geological Institute, University of Tokyo. Dataset. doi:10.1594/PANGAEA.726855.http://dx.doi.org/10.1594/PANGAEA.726855 Title

  23. Altman and King (2007): Dataverse Sidney Verba. 1998. “U.S. and Russian Social and Political Participation Data,” hdl:1902.4/00754 UNF:3:ZNQRI14053UZq389x0Bffg?== NORC [Producer]; data set [Type (DC)] ICPSR [Distributor]. Lawrence et al. (2008): BADC Iwi, A. and B. N. Lawrence (2004). A 500 year control run of HadCM3. [GridSeries, http://ndg.nerc.ac.uk/csml2/GridSeries] Version 1. BADC. urn:badc.nerc.ac.uk_coapec500yr [Available from http://badc.nerc.ac.uk/data/coapec500yr]. Green (2010): OECD OECD (2009), “Key short-term indicators”, Main Economic Indicators (database). doi: 10.1787/data-00039-en http://dx.doi.org/10.1787/data-00039-en (Accessed on 14 September 2009) Starr and Gastl (2011): DataCite Irino, T; Tada, R (2009): Chemical and mineral compositions of sediments from ODP Site 127-797. V.2. Geological Institute, University of Tokyo. Dataset. doi:10.1594/PANGAEA.726855.http://dx.doi.org/10.1594/PANGAEA.726855 Version

  24. Altman and King (2007): Dataverse Sidney Verba. 1998. “U.S. and Russian Social and Political Participation Data,” hdl:1902.4/00754 UNF:3:ZNQRI14053UZq389x0Bffg?== NORC [Producer]; data set [Type (DC)] ICPSR [Distributor]. Lawrence et al. (2008): BADC Iwi, A. and B. N. Lawrence (2004). A 500 year control run of HadCM3. [GridSeries, http://ndg.nerc.ac.uk/csml2/GridSeries] Version 1. BADC. urn:badc.nerc.ac.uk_coapec500yr [Available from http://badc.nerc.ac.uk/data/coapec500yr]. Green (2010): OECD OECD (2009), “Key short-term indicators”, Main Economic Indicators (database). doi: 10.1787/data-00039-en http://dx.doi.org/10.1787/data-00039-en (Accessed on 14 September 2009) Starr and Gastl (2011): DataCite Irino, T; Tada, R (2009): Chemical and mineral compositions of sediments from ODP Site 127-797. V.2. Geological Institute, University of Tokyo. Dataset. doi:10.1594/PANGAEA.726855.http://dx.doi.org/10.1594/PANGAEA.726855 Feature

  25. Altman and King (2007): Dataverse Sidney Verba. 1998. “U.S. and Russian Social and Political Participation Data,” hdl:1902.4/00754 UNF:3:ZNQRI14053UZq389x0Bffg?== NORC [Producer]; data set [Type (DC)] ICPSR [Distributor]. Lawrence et al. (2008): BADC Iwi, A. and B. N. Lawrence (2004). A 500 year control run of HadCM3. [GridSeries, http://ndg.nerc.ac.uk/csml2/GridSeries] Version 1. BADC. urn:badc.nerc.ac.uk_coapec500yr [Available from http://badc.nerc.ac.uk/data/coapec500yr]. Green (2010): OECD OECD (2009), “Key short-term indicators”, Main Economic Indicators (database). doi: 10.1787/data-00039-en http://dx.doi.org/10.1787/data-00039-en (Accessed on 14 September 2009) Starr and Gastl (2011): DataCite Irino, T; Tada, R (2009): Chemical and mineral compositions of sediments from ODP Site 127-797. V.2. Geological Institute, University of Tokyo. Dataset. doi:10.1594/PANGAEA.726855.http://dx.doi.org/10.1594/PANGAEA.726855 Resource Type

  26. Altman and King (2007): Dataverse Sidney Verba. 1998. “U.S. and Russian Social and Political Participation Data,” hdl:1902.4/00754 UNF:3:ZNQRI14053UZq389x0Bffg?== NORC [Producer]; data set [Type (DC)] ICPSR [Distributor]. Lawrence et al. (2008): BADC Iwi, A. and B. N. Lawrence (2004). A 500 year control run of HadCM3. [GridSeries, http://ndg.nerc.ac.uk/csml2/GridSeries] Version 1. BADC. urn:badc.nerc.ac.uk_coapec500yr [Available from http://badc.nerc.ac.uk/data/coapec500yr]. Green (2010): OECD OECD (2009), “Key short-term indicators”, Main Economic Indicators (database). doi: 10.1787/data-00039-en http://dx.doi.org/10.1787/data-00039-en (Accessed on 14 September 2009) Starr and Gastl (2011): DataCite Irino, T; Tada, R (2009): Chemical and mineral compositions of sediments from ODP Site 127-797. V.2. Geological Institute, University of Tokyo. Dataset. doi:10.1594/PANGAEA.726855.http://dx.doi.org/10.1594/PANGAEA.726855 Publisher

  27. Altman and King (2007): Dataverse Sidney Verba. 1998. “U.S. and Russian Social and Political Participation Data,”hdl:1902.4/00754 UNF:3:ZNQRI14053UZq389x0Bffg?== NORC [Producer]; data set [Type (DC)] ICPSR [Distributor]. Lawrence et al. (2008): BADC Iwi, A. and B. N. Lawrence (2004). A 500 year control run of HadCM3. [GridSeries, http://ndg.nerc.ac.uk/csml2/GridSeries] Version 1. BADC. urn:badc.nerc.ac.uk_coapec500yr [Available from http://badc.nerc.ac.uk/data/coapec500yr]. Green (2010): OECD OECD (2009), “Key short-term indicators”, Main Economic Indicators (database). doi: 10.1787/data-00039-en http://dx.doi.org/10.1787/data-00039-en (Accessed on 14 September 2009) Starr and Gastl (2011): DataCite Irino, T; Tada, R (2009): Chemical and mineral compositions of sediments from ODP Site 127-797. V.2. Geological Institute, University of Tokyo. Dataset. doi:10.1594/PANGAEA.726855. http://dx.doi.org/10.1594/PANGAEA.726855 Identifier

  28. Altman and King (2007): Dataverse Sidney Verba. 1998. “U.S. and Russian Social and Political Participation Data,” hdl:1902.4/00754 UNF:3:ZNQRI14053UZq389x0Bffg?== NORC [Producer]; data set [Type (DC)] ICPSR [Distributor]. Lawrence et al. (2008): BADC Iwi, A. and B. N. Lawrence (2004). A 500 year control run of HadCM3. [GridSeries, http://ndg.nerc.ac.uk/csml2/GridSeries] Version 1. BADC. urn:badc.nerc.ac.uk_coapec500yr [Available from http://badc.nerc.ac.uk/data/coapec500yr]. Green (2010): OECD OECD (2009), “Key short-term indicators”, Main Economic Indicators (database). doi: 10.1787/data-00039-en http://dx.doi.org/10.1787/data-00039-en (Accessed on 14 September 2009) Starr and Gastl (2011): DataCite Irino, T; Tada, R (2009): Chemical and mineral compositions of sediments from ODP Site 127-797. V.2. Geological Institute, University of Tokyo. Dataset. doi:10.1594/PANGAEA.726855. http://dx.doi.org/10.1594/PANGAEA.726855 Location

  29. Altman and King (2007): Dataverse Sidney Verba. 1998. “U.S. and Russian Social and Political Participation Data,” hdl:1902.4/00754 UNF:3:ZNQRI14053UZq389x0Bffg?== NORC [Producer]; data set [Type (DC)] ICPSR [Distributor]. Lawrence et al. (2008): BADC Iwi, A. and B. N. Lawrence (2004). A 500 year control run of HadCM3. [GridSeries, http://ndg.nerc.ac.uk/csml2/GridSeries] Version 1. BADC. urn:badc.nerc.ac.uk_coapec500yr [Available from http://badc.nerc.ac.uk/data/coapec500yr]. Green (2010): OECD OECD (2009), “Key short-term indicators”, Main Economic Indicators (database). doi: 10.1787/data-00039-enhttp://dx.doi.org/10.1787/data-00039-en (Accessed on 14 September 2009) Starr and Gastl (2011): DataCite Irino, T; Tada, R (2009): Chemical and mineral compositions of sediments from ODP Site 127-797. V.2. Geological Institute, University of Tokyo. Dataset. doi:10.1594/PANGAEA.726855.http://dx.doi.org/10.1594/PANGAEA.726855 Unique Numeric Fingerprint

  30. Author Publication date Title Location (= identifier) Key citation elements

  31. ... because good research needs good data Further examples of data citation DCC Data Citation Tutorial, 2 October 2012 #ukdcc

  32. ... because good research needs good data Example: ESIP/NSIDC • Mark Parsons, Ruth Duerr and the Federation of Earth Science Information Partners (ESIP) Cline, D., R. Armstrong, R. Davis, K. Elder, and G. Liston. 2002, Updated July 2004. CLPX-Ground: ISA snow pit measurements. Edited by M. Parsons and M. J. Brodzik. Boulder, CO: National Snow and Ice Data Center. Data set accessed 2008-05-14 at http://nsidc.org/data/nsidc-0176.html. Text Slide derived from Jean Bernard Minster (2011) DCC Data Citation Tutorial, 2 October 2012 #ukdcc

  33. Example: GBIF (Biodiversity) Publisher (individual) one time data release Publisher (group of individuals) frequent updates Institute/Research Group/ Consortium – frequent updates Slide based on Chavan (2011)

  34. Publisher (YEAR), <Title of the data resource>, <total nos. of records>, published <modes of publishing>, <Primary access point>, released on<release date>, <Persistent Identifier>. Rumble KJ (1998). Cephalopods of North America. 10023 records, published online, http://www.rumblejk.org/CephNA/, released on 31/12/1998, doi:10.4000/iisc.0.00.36. Publisher 1, ..... and Publisher n <YEAR). <Title of the data resource>, <total nos. of records>, published <modes of publishing>, <Primary access point>, first released on<release date>, <current version no. or last updated/released on (date)>, <Persistent Identifier>. Groups Frequent updates Remsen D, Bello J, Sheldon S, Raymond M, and AJK Arino (2005 -). Fishes of the Cape Cod Region, MA,USA. 70089 records published online, http://www.remsen.net/capecodfishes/, first released on 17/05/2005, last updated on 10/10/2010, doi: 11.3389/mbl.1.11.131. Slide based on Chavan (2011)

  35. <Publisher as Institution / Research Group / Consortium> <YEAR (Year first published / released -)>, <Title of the data resource>, <total nos. of records>, <Contributed by contributor 1(role), contributor 2 (role)..... contributor n(role)>, <published (modes of publishing)>, <Primary access point>,<Version no., or last updated/released on (date)>, <Persistent Identifier>. Consortium, roles Frequent updates Smithsonian National Museum of Natural History (2002 -), Museum Collection Records: Mammals. 579257 records. Contributed by Helgen KM (Principal Investigator, curator, author), Gordon LK (manager, author, curator), Peurach SC (author, manager), Potter CW (manager, author), Carleton MD (curator), Maldonado JE (author, developer), Wilson DE (curator, author), Thorington Jr RW (curator, author, validator), Ludwig CA (manager, developer, author), Lunde DP (author). Published online, http://collections.nmnh.si.edu/search/mammals/, first released on 12/02/2002, last updated on 15/09/2010, doi:17.3377/smi.8.57.965. Slide based on Chavan (2011)

  36. Example: ISIS facility DOIs issued by ISIS are in the form: 10.5286/ISIS.E.1234567 The recommended format for citation is: Author, A N. et al; (2010): RB123456, STFC ISIS Facility, doi:10.5286/ISIS.E.1234567 Via Sarah Callaghan (2011)

  37. ... because good research needs good data Exercise: metadata DCC Data Citation Tutorial, 2 October 2012 #ukdcc

  38. Links to resources at web page: http://homes.ukoln.ac.uk/~lismab/downloads/citationExercises.html Survey monkey (to fill in metadata) http://www.surveymonkey.com/s/RFHPYHW ... because good research needs good data http://www.esds.ac.uk/findingData/snDescription.asp?sn=4351 http://archaeologydataservice.ac.uk/archsearch/record.jsf?titleId=1699491 http://researchdata.ands.org.au/diffuse-reflectance-spectroscopy-of-u-np-and-pu-in-tho2 http://www.icpsr.umich.edu/icpsrweb/ICPSR/studies/6152 http://purl.org/phylo/treebase/phylows/study/TB2:S1514 Author Title Date Publisher Identifier and/or location Citation Metadata Exercise Text DCC Data Citation Tutorial, 2 October 2012 #ukdcc

  39. ... because good research needs good data Coffee break (15 mins)back at 3.45 DCC Data Citation Tutorial, 2 October 2012 #ukdcc

  40. Introductions Reasons and requirements for data citation Elements of data citation Examples Exercise metadata Comfort break ... because good research needs good data • Technologies • Infrastructure • Exercise identifier services • Community activity • Sources of information • How to support data citation • e • Discussion • Issues and questions Overview of the afternoon Text DCC Data Citation Tutorial, 2 October 2012 #ukdcc

  41. ... because good research needs good data Technologies DCC Data Citation Tutorial, 2 October 2012 #ukdcc

  42. Primary function is to identify a resource uniquely However increasingly the trend is towards using identifiers that can locate resources Handles, Archival Resource Keys (ARKs) and Persistent URLs (PURLs) all resolve to Internet locations DOI – business model, conventions, has gained traction, works with Handles. ... because good research needs good data Identifiers Text DCC Data Citation Tutorial, 2 October 2012 #ukdcc

  43. ... because good research needs good data Identifiers Text From van de Sompel(2011) http://sites.nationalacademies.org/PGA/brdi/PGA_064019 slide 8 DCC Data Citation Tutorial, 2 October 2012 #ukdcc

  44. ... because good research needs good data Digital Object Identifier (DOI) • Administered by the National DOI Foundation • Structure of prefix, slash, suffix • Begin with 10 • Resolved using http://dx.doi.org/ Text http://dx.doi.org / 10.5284 / 1000389 (assigning body) (resource) Resolver service Prefix Suffix DCC Data Citation Tutorial, 2 October 2012 #ukdcc

  45. ... because good research needs good data ARKs • Focus on actionable identifiers • Link to metadata and commitment statement • Championed by the California Digital Library (CDL) • CDL offers services including EZID Text From https://wiki.ucop.edu/display/Curation/ARK DCC Data Citation Tutorial, 2 October 2012 #ukdcc

  46. ... because good research needs good data PURLs • URLs that offer an indirection service • Service mainly offered by OCLC • Main advantage: simplicity Text DCC Data Citation Tutorial, 2 October 2012 #ukdcc

  47. ... because good research needs good data Infrastructure DCC Data Citation Tutorial, 2 October 2012 #ukdcc

  48. ... because good research needs good data • Basic and applied biosciences • Linked to journals • File checking • DOIs assigned • Joint Data Archiving Policy • Citation guidelines • Works with citation packages Text DCC Data Citation Tutorial, 2 October 2012 #ukdcc

  49. ... because good research needs good data PANGAEA Similar to DRYAD but in the Earth and Environmental Sciences Text DCC Data Citation Tutorial, 2 October 2012 #ukdcc

  50. ... because good research needs good data Dataverse • A software application • Authors can contribute to an existing instance or create a new one • Developed mainly by IQSS at Harvard • Uses the idea of a Unique Numeric Fingerprint Text DCC Data Citation Tutorial, 2 October 2012 #ukdcc

More Related