210 likes | 475 Views
Data Management Principles - Planning. UniMelb Cluster - Research Symposium Lyle Winton 24 Oct 2008. Who am I?. Dr Lyle Winton Background: Researcher/Scientist experimental high energy physics, distributed systems, Grid Technical Consultant
E N D
Data Management Principles - Planning UniMelb Cluster - Research Symposium Lyle Winton 24 Oct 2008
Who am I? • Dr Lyle Winton • Background: • Researcher/Scientist • experimental high energy physics, distributed systems, Grid • Technical Consultant • education and research for gov. infrastructure projects • Software Engineer • industry, higher education (web development, information systems, enterprise systems) • Currently: • eScholarship Research Centre (eSRC) & Research Computing Services, Information Services • Senior Research Support Officer (eResearch) • provide ICT support for research workers, supply expertise & strategic advice • develop plans for eResearch infrastructure • be active in local & national eResearch co-ordination groups eScholarship Research Centre
Data Management • What are we doing…(eSRC & eR – myself, Joanne Evans, Simon Porter, Gavan McCarthy, Leon Sterling) • Policy • Planning (focus) • Tools (focus) • Services • Infrastructure • Training (focus) • Consultancy (focus) eScholarship Research Centre
Nationally… ANDS Vision: “The development of ANDS is intended to provide the essential meeting place where the Australian path forward for research data management can evolve and where a vision can be achieved.” Towards an Australian Data Commons, ANDS – Oct 2007 “ – institutions will be expected to have and support data management plans, and any researcher seeking support through a number of government funding agencies will be expected to describe how the data generated through the project will be managed throughout its lifecycle.” ANDS Interim Business Plan – Sept 2008 “Enabling Components… Data Storage: … This investment will extend to research organisations for the development of institutional nodes of the storage grid, on the condition that the storage is used exclusively for research data; the institutes co-invest in the infrastructure; each institute publishes and adopts a data management plan; and each institute ensures its researchers use and abide by the data management plan.”Strategic Roadmap for Research Infrastructure, NCRIS – July 2008. eScholarship Research Centre
Known problems… “A mature data stewardship system, interlinking policy and infrastructure could address the needs of researchers and improve the quality and efficiency of Australian innovation and research.” “The survey found that individual researchers and research groups do not include data management as an element when planning research projects.” “Grants do not fund the creation of datasets as an end in itself, nor are funds provided explicitly for the management of data.” “The survey found that research groups and organisations rarely have formal policies for the management of data. They usually have a set of practices that may or may not be adhered to at the project level.” “Researchers… see research data as belonging to them. … Experienced researchers have been managing data all their careers.” AERES report – Oct 2006 eScholarship Research Centre
Some UniMelb goals… • Information Futures Commission • Excerpts from final report… • We will know we're on track if: • “Management and dissemination of research data and digital collections is painless.” • We propose that we will: • “Develop and adopt standards, guidelines and processes for the management, access and preservation of research data” • “Implement a program for targeted curation of collections…” • “Implement a digitisation and profiling strategy for works in collections (including 'born digital')…” • Numerous references to services surround data: • “Adequate physical and digital collections support research, learning and teaching, and knowledge transfer … Cataloguing and search tools make it easy to discover, cite and manage information.” eScholarship Research Centre
DataManagementPlan(DMP) Where are we heading? • Formal Research Data Management Infrastructure/Plans/Policies are emerging! • Globally researchers are beginning to adopt this as good practice • University is moving towards this as standard practice • We need to start implementing and/or improving… • Professional Data/Info Management Practice • ensuring quality research data • enables (appropriate) access • enables reuse of data • Policy, Intellectual Property & Licensing,Contracts, Legislation, Process … • not just paperwork and hurdles • ensuring research has integrity, repeatability • enables (appropriate) access • enables reuse of data eScholarship Research Centre
Burroughs 1977 – B 9495 Magnetic Tape Subsystem Why now? • Research Data is increasing in size • Research Collaborations are increasing • Data is increasingly digital • Wonderful opportunities for reuse,sharing, collaboration, analysis • However: • while microfilm and non-acidic papercan last for 100+ years • magnetic media lasts 10+ years • optical media lasts 20+ years(with proper handling) • 2-10% of hard drives fail every year • software & hardware can outdate • And much info is still only hardcopy • Lab books, notes, primary data, samples eScholarship Research Centre
Parts of the elephant… • Researchers & Departments • are at varying levels of maturity • are experiencing different pain-points • Infrastructure Providers • are focused on specific problems • are experts in different aspects/solutions • are getting varying requirements eScholarship Research Centre
Framing the elephant… eScholarship Research Centre
Training for post-grads • UpSkills eResearch Stream – “Data Management Workshop” • run 3 so far • Influences and References • The University of Melbourne Policy(Research Office, Records Services) • Australian Code for Responsible Conduct of Research(NHMRC, ARC, Universities Australia) • OAK Law Project, QUT • Belinda Weaver presentations, UQ • PILIN Project (ANDS/ARROW) • A few examples! • Review of material • By eScholarship Research Centre • By local eResearch social network (eCoffee) • By a small group of department research/IT managers • By School of Graduate Research eScholarship Research Centre
Training for post-grads • Workshop Covers: • Components of a “Data Management Plan” • Recommended reading list • Information Modelling, Good Practice Guidance • Technologies • Feedback has been very positive!!! • Development of a web site (ongoing) • Resources, References, Examples, Q&A • A Research DMP Template (ongoing) • Drafting guidelines to support theimplementation and compliance (underway) • Future developments: • Training materials for supervisors? • Discussing undergraduate data managementtraining across Uni • Possible DMP registry eScholarship Research Centre
Why Manage Research Data • IT IMPROVES YOUR RESEARCH BOTH NOW AND LATER… • Data is often valuable for a long time!!! • Results of your research may outlast the project, your degree,your position, your career, your institution • historical value, predictable or unforseen • Maximise usefulness of data to fellow researchers • Context for the research, how data was collected, quality controls, how people canand should use it (access and licensing), how you then attribute people/projects • can help lead to subsequent research papers • Good Practice Better Research • DMP’s state the parameters within which you MUST do research,then follow them! (being a Professional Researcher) • document for new comers, your group, project, externals • Ensure research integrity (and repeatability) • through keeping better records • can trace your outcomes right from data collection, through research method, through to results • promotes awareness of responsibilities, policies, ethics, legislation eScholarship Research Centre
Why Manage Research Data • IT MAY SAVE WASTED TIME… • You need to properly… • Collect research data • Manage research data • Archive research data • …otherwise there is a risk you cannot use your data, wasting years of effort. • From a study of 500 charges of “research misconduct” 40% could have been avoided by good data management practice! • “Student submits her PhD thesis for examination then leaves country taking the data with them. An examiner questions the integrity of the research data. A reanalysis of the data and original questionnaire is required.” • “Participant in a research project lodges a claim for compensation, alleging that he was not adequately informed about the effects of the study, does not recall giving consent, and the raw data he provided has become public. Where are the records?“ • “Ten years after a patent has been granted a patent infringement action is lodged. The laboratory notebook is required.” • “At completion of a research project the data and records are boxed and stored in a departmental storeroom. Sometime later the researcher needs to access the original records to refute a claim of falsification. He finds that the storeroom has since been converted into a laboratory/coffee-shop/learning-hub.” Defending Integrity >>> getting to Data eScholarship Research Centre
Why Manage Research Data • AND YOU NEED TO PLAN AHEAD… • University of Melbourne Policy • research methods and results open to scrutiny • data should be retained in a durable and appropriately referenced form • for at least 5 years from any publication • minimum of 15 years for clinical trials • minimum of 7 years for adult psychological files (for minors 7 years after reaching 18) • or longer if external/funding/regulatory/archival requirements • research units & departments have formallydocumented procedures for retention • researchers must comply • ensure research data and records areaccurate, complete, authentic and reliable • data and records formed for verification andinclude sufficient detail(authenticity and validity of conclusions) eScholarship Research Centre
What’s in a DMP? • A Possible Template: • Context (Outline, Pre-planning, Decisions) • Responsibilities (ethics, consent, licensing, legislation, funding requirements, reporting) • Process & Policies • Data Collection and QC Process • Access Policy • Appropriate Use and Access Patterns • Data Maintenance, Persistence and Archival Practice • Decommissioning/Destruction/Sanitisation • Technical Requirements (policy for system developers/implementers/admins) • Current Infrastructure and Requirements • Future Infrastructure Requirements • Interoperability • Data Security • Availability, Reliability, Support and Response (full template found at http://www.esrc.unimelb.edu.au/dmp ) eScholarship Research Centre
Why Plan? • Making the most of Infrastructure • ARCS Data Fabric (NCRIS) • University Infrastructure • National Compute Infrastructure (VLSCI, ANUSF, VPAC) • Advanced Technology (imaging, sequencing, synchrotron) eScholarship Research Centre
Why Plan? • Making the most of Research Networks • ANDS Data Commons • BioGrid Australia • Protein Data Bank • Increasingly you need to ensure • Research integrity, traceability • Data and Result quality • Data reusability • Data security (misuse/damage, unintended/intended) eScholarship Research Centre
Communication • 2-way Communication is important • Administration/ICT and Research Community • Good Practice will emerge from both Research and ICT expertise • National Infrastructure • Opportunities and Trade-offs • 3+ -way communication ? • Vision: a local community of practice • to provide and review guidelines and policies • to share data management plans • to drive development of shared infrastructure • advocate for and steer national infrastructure eScholarship Research Centre
What you can do… • http://www.esrc.unimelb.edu.au/dmp • Provide general feedback • Ask questions, we’ll seek answers • Work with us on guidance & good practice • Encourage students to attend future UpSkills • Talk with your students/group/department about formally documenting a DMP • Feed back you DMP eScholarship Research Centre