150 likes | 487 Views
What are our current problems in research data management? ... Draft Research Data Management Policy to go to eResearch committee for approval ...
E N D
Slide 1:Data Management Subcommittee of the eResearch Committee
Research Data Management Policy Development – Monash University In this time, I will talk about moves underway within the University to better manage the retention and use of data created by researchers working across the institution. Primarily I will be focussing on the management of electronic data, and the creation of a policy and related measures to assist this.In this time, I will talk about moves underway within the University to better manage the retention and use of data created by researchers working across the institution. Primarily I will be focussing on the management of electronic data, and the creation of a policy and related measures to assist this.
Slide 2:Presentation Today
How much data is there? What are our current problems in research data management? What needs to be done? Policy development Creation of an effective research data management process I’ll focus first on the data we’re trying to manage, how much is there and will there be, and the problems that we currently have in retaining and storing that data. I will then go on to talk about how there is work being done, initially on the policy front, to address the issues with the current state of play. I will discuss the creation of a research data management policy, which is the first step in creating a research data management regime that will put Monash at the forefront of Australian universities in effective storage and management of research data.I’ll focus first on the data we’re trying to manage, how much is there and will there be, and the problems that we currently have in retaining and storing that data. I will then go on to talk about how there is work being done, initially on the policy front, to address the issues with the current state of play. I will discuss the creation of a research data management policy, which is the first step in creating a research data management regime that will put Monash at the forefront of Australian universities in effective storage and management of research data.
Slide 3:HOW MUCH DATA IS THERE?
300 page book = approx. 15 megabytes Library 2 mill. Books = approx. 28 terabytes Synchrotron = 1 terabyte per DAY USC Shoah Foundation visual archive = 1 petabyte over time Firstly, to the size of the problem – how much research data is being created at Monash? One could ask how long is a piece of string, but it is already into terabytes of information, and is about to increase dramatically. To put this into perspective, a 300 page book contains approximately 15 megabytes of information. Monash University Library has 2 million books on its shelves. This is equivalent to approximately 28 terabytes of data. It is estimated that the Synchrotron, once it is online, will be capable of producing 1 terabyte of data per day, i.e. more than 2 millions books worth of data a month. Monash’s micro imaging and electro microscopy activities can also produce terabytes of data relatively quickly. And it is not only scientific research that can produce large datasets. The school of music’s archive already contains more than a terabyte of data, and the collection of Holocaust information from the Shoah foundation held at Monash is envisaged to grow to a petabyte of data over time. Firstly, to the size of the problem – how much research data is being created at Monash? One could ask how long is a piece of string, but it is already into terabytes of information, and is about to increase dramatically. To put this into perspective, a 300 page book contains approximately 15 megabytes of information. Monash University Library has 2 million books on its shelves. This is equivalent to approximately 28 terabytes of data. It is estimated that the Synchrotron, once it is online, will be capable of producing 1 terabyte of data per day, i.e. more than 2 millions books worth of data a month. Monash’s micro imaging and electro microscopy activities can also produce terabytes of data relatively quickly. And it is not only scientific research that can produce large datasets. The school of music’s archive already contains more than a terabyte of data, and the collection of Holocaust information from the Shoah foundation held at Monash is envisaged to grow to a petabyte of data over time.
Slide 4:HOW MUCH DATA IS THERE?
When we see it in a graph, such as here, we can see the revolution in data creation that is taking place - Monash University Library’s collection of books which has taken nearly 50 years to collect is rapidly becoming a very small part of the data universe here at Monash, without even thinking about the rest of the world. When we see it in a graph, such as here, we can see the revolution in data creation that is taking place - Monash University Library’s collection of books which has taken nearly 50 years to collect is rapidly becoming a very small part of the data universe here at Monash, without even thinking about the rest of the world.
Slide 5:CURRENT STATE OF PLAY
ad hoc v., to use ad hoc measures or contrivances, to improvise; so adhoc(k)ing vbl. n.; ad hoc-ery, the use of such measures; ad hocism (also as one word), the use of ad hoc measures, esp. as a deliberate means of avoiding long-term policy; ad-hoc-ness, the nature of, or devotion to, ad hoc principles or practice. Proper management of such a large amount of data is in both the researcher’s and the University’s best interests. The current state of play at Monash is best described as ad-hoc, with researchers being primarily responsible for the storage and management of their own data. In practice this has led to arrangements as diverse as faculty controlled storage, with regular back up and some security features, to researchers keeping all their data on their own PC, with no backup, or maybe a DVD or USB drive somewhere, and searching available only through Windows explorer.Proper management of such a large amount of data is in both the researcher’s and the University’s best interests. The current state of play at Monash is best described as ad-hoc, with researchers being primarily responsible for the storage and management of their own data. In practice this has led to arrangements as diverse as faculty controlled storage, with regular back up and some security features, to researchers keeping all their data on their own PC, with no backup, or maybe a DVD or USB drive somewhere, and searching available only through Windows explorer.
Slide 6:ISSUES WITH CURRENT ACTIVITIES
Security Access Control / Confidentiality Hidden Costs Non-retrievable data due to poor categorizing or loss Regulatory and other risks Lack of knowledge of research activity At the moment, there is no central university role in managing research data. There are hidden costs in leaving researchers to do it themselves – the cost of storage devices comes from research funds, and the cost in time of researchers organising for storage of their data, managing their data, and replicating data if lost through poor storage and management options. Duplication of effort across faculties in trying to deal with the situation also costs the University time and money. Risk factors are poorly managed in this scenario, for apart from the general risk of corruption or loss of data from poor data management activity, the University leaves itself open to regulatory risk if data is not kept for mandated periods of time, or unauthorised access to data is allowed. The irretrievable loss of any research data also contains the unquantifiable cost of knowledge being lost to the University and the wider world.At the moment, there is no central university role in managing research data. There are hidden costs in leaving researchers to do it themselves – the cost of storage devices comes from research funds, and the cost in time of researchers organising for storage of their data, managing their data, and replicating data if lost through poor storage and management options. Duplication of effort across faculties in trying to deal with the situation also costs the University time and money. Risk factors are poorly managed in this scenario, for apart from the general risk of corruption or loss of data from poor data management activity, the University leaves itself open to regulatory risk if data is not kept for mandated periods of time, or unauthorised access to data is allowed. The irretrievable loss of any research data also contains the unquantifiable cost of knowledge being lost to the University and the wider world.
Slide 7:DATA MANAGEMENT SUBCOMMITTEE
Members from eResearch, ITS, Records & Archives and Library ISSUES Centralised vs. Decentralised storage Preservation and Archiving of Data Amount of storage Funding model In recognition of the increasing need for the university to provide storage and data management guidance, the eResearch committee has convened a subcommittee to create a Research Data Management policy for the University. The subcommittee consists of staff from the eResearch centre, Information Technology Services Division, Records and Archives, and the Library. Many issues have been raised in the course of developing the policy, many of which were, to borrow a phrase from Kevin Rudd “fork in the road” issues. These have included a discussion about where research data should be stored in the University, what exactly to we mean by storage, preservation and archiving of data, how much storage do we need, and, of course, the important question of who pays for it. There was an interesting presentation at the Information Online conference last week from Markus Buchhorn entitled “The preservation and sustainability of research data in Australia”, where among his points he noted that the technical ability to store etc. was already here, what was lacking was the policy framework around that.In recognition of the increasing need for the university to provide storage and data management guidance, the eResearch committee has convened a subcommittee to create a Research Data Management policy for the University. The subcommittee consists of staff from the eResearch centre, Information Technology Services Division, Records and Archives, and the Library. Many issues have been raised in the course of developing the policy, many of which were, to borrow a phrase from Kevin Rudd “fork in the road” issues. These have included a discussion about where research data should be stored in the University, what exactly to we mean by storage, preservation and archiving of data, how much storage do we need, and, of course, the important question of who pays for it. There was an interesting presentation at the Information Online conference last week from Markus Buchhorn entitled “The preservation and sustainability of research data in Australia”, where among his points he noted that the technical ability to store etc. was already here, what was lacking was the policy framework around that.
Slide 8:ISSUES BEFORE SUBCOMMITTEE
Data storage – should be centralised Preservation – who does it? Proposed Data Curation function (ITS and Library?) Preservation, Migration, Archiving Funding model Needs to be ongoing, increasing, affordable LaRDS proposal from Neil Clarke Centralised storage of research data has been a discussion point from early in the process, and has become part of what the policy is advocating. While the committee sees a role for “on the spot” data storage for brief periods in the data lifecycle, the draft policy does recommend that research data is stored centrally, to aid security, retention and preservation of the data created. This may not mean one central computer, but centralised access and control. Information Technology Services have the ability to store large amounts of data, but it is not at the moment within their remit to provide active “management” of data, such as reformatting, preservation, archiving etc. As the subcommittee’s discussions have progressed, it has come to believe that the University does need to support activity in data management, including preservation/archiving. At this stage the subcommittee is suggesting a data curation function be provided by the University – this function would provide advice on data management to researchers, and be able to provide active “curation” of research data, such as reformatting, restoration and migration of data, in addition to providing strategic advice to the University. The funding model for providing data storage is being developed by Neil Clarke, Manager of the eResearch centre. His proposal envisages an up front payment by faculty to access ongoing storage, with capacity increases built in. Centralised storage of research data has been a discussion point from early in the process, and has become part of what the policy is advocating. While the committee sees a role for “on the spot” data storage for brief periods in the data lifecycle, the draft policy does recommend that research data is stored centrally, to aid security, retention and preservation of the data created. This may not mean one central computer, but centralised access and control. Information Technology Services have the ability to store large amounts of data, but it is not at the moment within their remit to provide active “management” of data, such as reformatting, preservation, archiving etc. As the subcommittee’s discussions have progressed, it has come to believe that the University does need to support activity in data management, including preservation/archiving. At this stage the subcommittee is suggesting a data curation function be provided by the University – this function would provide advice on data management to researchers, and be able to provide active “curation” of research data, such as reformatting, restoration and migration of data, in addition to providing strategic advice to the University. The funding model for providing data storage is being developed by Neil Clarke, Manager of the eResearch centre. His proposal envisages an up front payment by faculty to access ongoing storage, with capacity increases built in.
Slide 9:DATA MANAGEMENT PLAN
Instrument to help researchers manage their data Complete at beginning of research project Captures some technical, access and descriptive metadata at the beginning of a research project The core of the Data Management Policy is the Data management plan. The draft policy as it stands is structured around the Data Management Plan, and it is envisaged that the data management plan will become the main instrument to help researchers manage their research data. The data management plan would be something that the researcher would complete at the beginning of the research project. The data management plan fulfils several roles – it helps make the researcher aware of the issues involved in creating and storing research data, it alerts the University about the project and what it might require in terms of storage space and management, covers retention and access issues and ongoing preservation of data. As time goes on, the data management plans completed by researchers will in themselves become a mini-archive of research completed at Monash.The core of the Data Management Policy is the Data management plan. The draft policy as it stands is structured around the Data Management Plan, and it is envisaged that the data management plan will become the main instrument to help researchers manage their research data. The data management plan would be something that the researcher would complete at the beginning of the research project. The data management plan fulfils several roles – it helps make the researcher aware of the issues involved in creating and storing research data, it alerts the University about the project and what it might require in terms of storage space and management, covers retention and access issues and ongoing preservation of data. As time goes on, the data management plans completed by researchers will in themselves become a mini-archive of research completed at Monash.
Slide 10:DATA MANAGEMENT PLAN
A typical data management plan would encompass the following elements: Originators and owners of the data Description of project Metadata (schema / standards) Types of data to be collected Volume of data to be managed, disc and tape storage required Retention of research data and records Format/s of and software used in creation and use of the data Access policies and provisions Confidentiality requirements Storage, preservation and archiving of data. The structure of the data management plan is such that upon completion the researcher (and the University) will know the following about a particular research project – who creates the data, who owns the data, who can (and can’t) have access to the data and when they can (or can’t) access it. How much data it is estimated there will be by the end of the project, what sort of data is it (experimental, interviews etc.) what formats the data will be in, what software has been used in the creation of the data. How long the data needs to be stored to satisfy regulatory authorities, how long the data should be stored to further research, and should the data be preserved in perpetuity for the wider research and general community. Most importantly, a description of the project as a whole, and the metadata schema that organises the data in a coherent manner. The importance of metadata in managing research data cannot be overstated. The data management plan itself will be an important piece of metadata in the data management process, providing a description at a whole-of-project level. Many of the larger research projects will create large caches of data which will need to be properly organised with metadata describing each data element (experiment, interview etc.). Some areas, especially in scientific and medical fields, have well developed and recognised schema which can be used in this process, others less so. Throughout the data management process, the emphasis will be on using recognised metadata schema if possible, as this saves time, and allows for inter-operability between projects further down the track. More policy work required within disciplines to organise effective schema – govt role? The structure of the data management plan is such that upon completion the researcher (and the University) will know the following about a particular research project – who creates the data, who owns the data, who can (and can’t) have access to the data and when they can (or can’t) access it. How much data it is estimated there will be by the end of the project, what sort of data is it (experimental, interviews etc.) what formats the data will be in, what software has been used in the creation of the data. How long the data needs to be stored to satisfy regulatory authorities, how long the data should be stored to further research, and should the data be preserved in perpetuity for the wider research and general community. Most importantly, a description of the project as a whole, and the metadata schema that organises the data in a coherent manner. The importance of metadata in managing research data cannot be overstated. The data management plan itself will be an important piece of metadata in the data management process, providing a description at a whole-of-project level. Many of the larger research projects will create large caches of data which will need to be properly organised with metadata describing each data element (experiment, interview etc.). Some areas, especially in scientific and medical fields, have well developed and recognised schema which can be used in this process, others less so. Throughout the data management process, the emphasis will be on using recognised metadata schema if possible, as this saves time, and allows for inter-operability between projects further down the track. More policy work required within disciplines to organise effective schema – govt role?
Slide 11:DATA MANAGEMENT PLAN
Critical to engage researchers in the process Can’t be too onerous Must have visible benefits Must be able to provide complete research data solution Implicit in the whole process is the engagement of the researchers in the process of data management. Part of the engagement process is to ensure that it is not too difficult or onerous to comply with any policy, and that they can see a benefit, in that their data is securely held, and is quickly and easily accessible when needed. Implicit in the whole process is the engagement of the researchers in the process of data management. Part of the engagement process is to ensure that it is not too difficult or onerous to comply with any policy, and that they can see a benefit, in that their data is securely held, and is quickly and easily accessible when needed.
Slide 12:THE FUTURE?
What might the future look like? LaRDS storing / preserving unpublished and “working” research data ARROW providing access to published work and related data / datasets Data Curation function providing strategic direction and practical advice to researchers Digital research data storage and management is a quickly developing field, and Monash University, along with many other universities, are in the forefront of devising policies and methods to manage the massively increasing rate of data generation from research. The following “look into the future” is an encapsulation of how research data management might look at Monash, but developments may overtake this at any moment. Researchers, when they begin their project, will be asked to complete a data management plan. A data curation fucntion will be on hand to provide advice and assistance if required. Once the research is underway, the data generated by the project will be stored in the LaRDS repository, which will store unpublished and “working” data. Working from information provided by the data management plan, data in the LaRDS store will have access controls, preservation and migration schedules, and disposal criteria imposed as appropriate. The University’s already existing ARROW repository will be the home of published research and any related data or datasets – in essence, research that can be accessed by a wider public. Strategic and policy advice in the area of research data management will be provided as part of the data curation function (ITS and Library), while also supporting researchers with one-on-one advice on data management issues, and providing more general education activities such as workshops, guides, and procedures.Digital research data storage and management is a quickly developing field, and Monash University, along with many other universities, are in the forefront of devising policies and methods to manage the massively increasing rate of data generation from research. The following “look into the future” is an encapsulation of how research data management might look at Monash, but developments may overtake this at any moment. Researchers, when they begin their project, will be asked to complete a data management plan. A data curation fucntion will be on hand to provide advice and assistance if required. Once the research is underway, the data generated by the project will be stored in the LaRDS repository, which will store unpublished and “working” data. Working from information provided by the data management plan, data in the LaRDS store will have access controls, preservation and migration schedules, and disposal criteria imposed as appropriate. The University’s already existing ARROW repository will be the home of published research and any related data or datasets – in essence, research that can be accessed by a wider public. Strategic and policy advice in the area of research data management will be provided as part of the data curation function (ITS and Library), while also supporting researchers with one-on-one advice on data management issues, and providing more general education activities such as workshops, guides, and procedures.
Slide 13:Context
National Collaborative Research Infrastructure Strategy (NCRIS) – e-Research Coordinating Committee of NCRIS Accessibility Framework Assistance in creating and linking repositories and storage Monash’s work on Research Data Management is taking place within a rapidly developing national response to the issues identified. NCRIS’s e-Research coordinating committee have produced a report now with the minister that, among other things, discusses the issues of research data management. In 1 May 2004 the Prime Minister announced that the Australian Government would establish Quality and Accessibility Frameworks for Publicly Funded Research. The Quality part of his statement is addressed in the dread acronym RQF, and we are beginning to have a good understanding of what that means, but the RAF, the accessibility framework has not yet been fully fleshed out. We may assume that a RAF will mean a requirement for research data to be discoverable, accessible and shareable, in order to improve the quality of research outcomes, reduce duplication and better manage research activities and reporting, as that is the definition as provided on the NCRIS page. NCRIS is also supporting the building of the technical infrastructure to underpin the RAF – things such as ARROW, the Aust Partneship for Sustainable Repositories, and assistance with developing middleware to enhance the effectiveness of repositories in providing access to research data.Monash’s work on Research Data Management is taking place within a rapidly developing national response to the issues identified. NCRIS’s e-Research coordinating committee have produced a report now with the minister that, among other things, discusses the issues of research data management. In 1 May 2004 the Prime Minister announced that the Australian Government would establish Quality and Accessibility Frameworks for Publicly Funded Research. The Quality part of his statement is addressed in the dread acronym RQF, and we are beginning to have a good understanding of what that means, but the RAF, the accessibility framework has not yet been fully fleshed out. We may assume that a RAF will mean a requirement for research data to be discoverable, accessible and shareable, in order to improve the quality of research outcomes, reduce duplication and better manage research activities and reporting, as that is the definition as provided on the NCRIS page. NCRIS is also supporting the building of the technical infrastructure to underpin the RAF – things such as ARROW, the Aust Partneship for Sustainable Repositories, and assistance with developing middleware to enhance the effectiveness of repositories in providing access to research data.
Slide 14:WHERE TO FROM HERE?
Draft Research Data Management Policy to go to eResearch committee for approval Data Management Plans begin to be filled out by researchers Storage and data management solutions put in place All this is the beginning of a process at Monash. The draft policy will be presented to the eResearch Committee in the next couple of weeks, and will then begin the process of refinement. The policy is one part of the suite of activities that are taking place to address the growing amount of research data produced throughout the University. The data storage solutions will soon be put in place (LaRDS) or already exist (ARROW), but work still needs to be done on how to assist researchers in managing the data they create – the proposed Data Curation Function needs to take shape and find a place in the overall picture. The policy here at Monash will also develop within a slowly bubbling soup of national activity in this area, most of which is pea soup rather than consommé, so there may be great changes in this vision. And all this needs to happen yesterday!All this is the beginning of a process at Monash. The draft policy will be presented to the eResearch Committee in the next couple of weeks, and will then begin the process of refinement. The policy is one part of the suite of activities that are taking place to address the growing amount of research data produced throughout the University. The data storage solutions will soon be put in place (LaRDS) or already exist (ARROW), but work still needs to be done on how to assist researchers in managing the data they create – the proposed Data Curation Function needs to take shape and find a place in the overall picture. The policy here at Monash will also develop within a slowly bubbling soup of national activity in this area, most of which is pea soup rather than consommé, so there may be great changes in this vision. And all this needs to happen yesterday!
Slide 15: ANY QUESTIONS? Contact – roger.clark@lib.monash.edu.au