370 likes | 386 Views
Preserving Research Data The Canadian Experience. Charles Humphrey University of Alberta February 2005. Outline. Two national consultations in Canada National Data Archive Consultation ( NDAC ), October 2000 to June 2002
E N D
Preserving Research DataThe Canadian Experience Charles Humphrey University of Alberta February 2005
Outline • Two national consultations in Canada • National Data Archive Consultation (NDAC), October 2000 to June 2002 • National Consultation on Access to Scientific Research Data (NCASRD), June 2004 to spring 2005 • Findings from these consultations • Future directions
Support Received This presentation draws upon research that was supported by the Social Sciences and Humanities Research Council of Canada (Grant No. 421-2000-0011 & 421-2000-0017) and upon the work of the National Data Archive Consultation and the National Consultation on Access to Scientific Research Data.
National Data Archive Consultation • Investigation on behalf of the National Archives and the Social Sciences and Humanities Research Council • Two phases • Year 1: Demonstrate the need for national data archiving services • Year 2: Recommend one or more models to provide these services
The NDAC: Phase I The definition of research data employed by the NDAC consists of three parts: • outputs of the research process that exist between raw research materials and published results; • digital information structured through methodology for the purpose of producing new knowledge; • digital information produced by researchers and of interest to researchers.
National Data Archive Consultation. Phase One: Needs Assessment Report. May 2001.
The Risk Level for Research Data in Canada Three studies were conducted in conjunction with the NDAC that provided evidence about the level to which Canadian research data are at risk.
The Risk Level for Research Data in Canada • A gap-analysis of existing mandates and practices of national institutions; • A follow-up study to an investigation first conducted twenty years ago by the now defunct Machine-Readable Archives; and • A survey of researchers receiving a standard research grant from the SSHRC between 1998 and 2000.
Gap Analysis • The mandates and practices of Canadian institutions with responsibilities for the preservation of heritagewere examined to determine the types of digital objects that are currently protected.
Gap Analysis Findings: • The vast majority of academic and non-academic research data fall outside the current interpretation and execution of the mandates of the National Library and National Archives (now the Library & Archives of Canada). • No other Canadian institutionhas a national mandate or the resources to address the current level of need for preserving research data.
Revisiting the MRA Study • An administrative investigation twenty years ago identified a population of 150 SSHRC-funded studies utilizing research data. • Twenty years later, can the data from any of these research projects be located?
Revisiting the MRA Study Findings: • Data from 3 out of 110 studies could be found without contacting the original principal investigators directly for further details. • The 3 studies for which data were found were all deposited in the United States with the Inter-university Consortium for Political and Social Research (ICPSR).
Revisiting the MRA Study Conclusion: The risk of data loss is very high without an institution with the specific mandate to preserve research data.
A Survey of SSHRC-funded Researchers • Researchers who received a standard research grant from the SSHRC between 1998 and 2000 were asked about their plans to preserve the data from their projects. • Only seven percent said that they had deposited the data from their funded project, while another 18 percent said they intend to deposit the data.
A Survey of SSHRC-funded Researchers When asked to identify where they had or would deposit data, almost all named a source that is not an archive. • They mistakenly listed university library data services, the Web, and a Statistics Canada Research Data Centre. • A couple of researchers indicated that they would deposit the data from their projects if they only knew where and how to do this.
A Survey of SSHRC-funded Researchers Conclusion: Without a recognized institution responsible for preserving research data, researchers do not know where or how to archive the data from their research, even if they would like to see the data preserved.
A Survey of SSHRC-funded Researchers Conclusion: For the vast majority of researchers in this study, archiving data is an unknown activity in conducting research.
Size of the Problem We know that research data are at risk, but how big of a problem is this? • The survey of researchers who received a standard research grant from the SSHRC provides evidence that around 550 out of every 1,000 projects results in the creation or use of data files and/or databases.
Size of the Problem • This is just the tip of the iceberg! • There is no estimate for other SSHRC-funded projects, other granting agencies, or other agencies and departments creating research data.
Does It Matter? This question has been asked in other countries with answers that apply equally in Canada. • Protecting the financial investment in data; • Stewardship and custodial responsibilities; • Legal and ethical obligations; and • Knowledge-generation opportunities.
The NDAC: Phase II • A primary objective of the second phase was to recommend the institutional form that national data archiving services should take in Canada. • First, research was conducted to identify the types of existing institutional models for data archives.
The Results • A typology of organizational models was developed from the results of a survey of 36 international organizations in data archiving and data services in the social sciences and humanities. • Three generalized models were identified that summarized groupings of the characteristics from the survey.
The Results • While no single existing institution is necessarily described completely by one of these three models, the typology offers a fair summary of the current mix of organizations.
The Three Models • The Topical Data Archive • The Agency-based Data Archive • The Comprehensive Research Data Archive
A Proposed Canadian Model • Establish by legislative mandate an agency reporting to Parliament through the Ministry of Industry or Heritage or a combination of both; • Fund centrally through Parliament; • Grant authority to act on behalf of the Government of Canada in international negotiations related to research data and its management standards and practices; • Structure as a network of distributed service points with a central service facility;
A Proposed Canadian Model • The central facility would be responsible for data management, standards development, and data preservation; • The service points would be responsible for assisting with the deposit of data, accessing data, and training and user consultation; • Service points would be located in universities and other institutions interested in providing access to preserved research data (a model similar to the Depository Service Program between government publishing and Canadian libraries);
A Proposed Canadian Model • A management board would oversee the operation of this National Data Archive Network and consist of representatives from the regions in Canada as well as various stakeholders that manage, use, and produce research data; • Furthermore, this agency would enter into formal co-operative working relationships with other national institutions, such as the Library and Archives of Canada and Statistics Canada.
National Consultation on Access to Scientific Research Data • A Task Force of experts was assembled to organize a two-day National Forum to investigate issues regarding access to research data in Canada and to formulate recommendations. • Experts from the natural and medical sciences were engaged to complement the work of the NDAC. • The Task Force developed a ‘mind-map’ of ideal achievements reached in 2010 as a result of improved data access.
The NCASRD National Forum • A document structured around the main entries of this ‘mind map’ was prepared and distributed to an assembly of 70 researchers who attended a National Forum held on November 22 & 23, 2004. • This body generated its own ‘mind map’ of achievements for the year 2010.
The NCASRD National Forum • Working backwards from the ideal achievements, sub-groups prepared vignettes of the steps needed to be accomplished to reach the end-states documented in the ‘mind map’. These steps were subsequently organized into recommendations. • A draft report with recommendations arising from the discussions at the National Forum has been written and circulated to members of the Task Force. Look for the final report to be available before the summer of 2005.
Future Directions • A field of funded research is needed to study issues about the preservation of and access to research data. • Areas for research: • The Data Economy: “who gets what data, when and how”; • The Life Cycle of Data: the course of data and its corresponding metadata from the earliest stages of planning through to the secondary uses of data;
Future Directions • Areas for research: • Metadata Standards: the development of standards that detail the life course of data; • Preservation Standards: the best practices in long-term preservation of data; • Data Stewardship: the ethics and best practices of sharing research data; • Inhibitors to Access: the legal and cultural barriers to access.
References Hackett, Yvette. “A national research data management strategy for Canada: the work of the National Data Archive Consultation Working Group.” IASSIST Quarterly, vol. 25, no. 3 (2001), 13 -16. Humphrey, Charles. “On the advantages of freely accessible data [comment letter].” Epidemiology, vol. 14, no. 3 (2003): 381. Humphrey, Charles. “Research for building a better data community.” IASSIST Quarterly, vol. 25, no. 1 (2001), 21-24. Jacobs, James A. and Charles Humphrey. “Preserving research data.” Communications of the ACM, vol. 47, no. 9 (September 2004): 27-29.
References National Data Archive Consultation. Phase One: Needs Assessment Report. May 2001.http://www.sshrc.ca/web/whatsnew/initiatives/da_phase1_e.pdf National Data Archive Consultation. Final Report: Building Infrastructure for Access to and Preservation of Research Data. June 2002.http://www.sshrc.ca/web/whatsnew/initiatives/da_finalreport_e.pdf National Consultation on Access to Scientific Research Data [website].http://ncasrd-cnadrs.scitech.gc.ca/about_e.shtml