300 likes | 317 Views
Andrew Treloar, ARCHER Project Director Cathrine Harboe-Ree, University Librarian Alan McMeekin, Executive Director ITS. Dancing with data down under CNI Winter 2007 Project Briefing. O is for Overview. Drivers for what we are presenting Research case study overview Challenges and solutions
E N D
Andrew Treloar, ARCHER Project DirectorCathrine Harboe-Ree, University LibrarianAlan McMeekin, Executive Director ITS Dancing with data down under CNI Winter 2007 Project Briefing
O is for Overview • Drivers for what we are presenting • Research case study overview • Challenges and solutions • Australian national developments
D: Monash – a distinctive and internationalised university • Established 1960 • Research intensive, doctoral granting • 55,000 students from more than 100 countries • 6.1% of student load is graduate • 3,500 academic staff (6,800 total EFT staff) • 10 faculties • Campuses in Australia (six), Malaysia, South Africa, centre in Prato • Partnerships – India, Hong Kong, Singapore, China • Total research income $186 mill. (2006)
D: Information Management Strategy • 2 year initiative to develop an overarching strategy for the whole university • Took holistic view of information • Informed by views of range of information management professionals and stakeholders • Report available at: www.monash.edu.au/staff/information-management/ • Based on set of ten principles that have been extended into the research data domain
D: Monash data management environment • High level support • DVC (Research), Prof Edwina Cornish • Establishment of E-Research Centre • Need to manage growing deluge • Leading E-researchers in some disciplines • Synchrotron (1 TB per day) • Shoah Archives (12 TB) • And others • Need to respond to Australian Code for the Responsible Conduct of Research • www.nhmrc.gov.au/publications/synopses/r39syn.htm
D: Three inter-related national projects Virtual Learning Environment Undergraduate Students Digital Library ARROW Graduate Students E-Researchers E- E-Researchers Reprints DART ARCHER Grid Peer-Reviewed Journal & Conference Papers Technical Reports LocalWeb Preprints & Metadata Institutional Archive Publisher Holdings 5 E-Experimentation Entire E-Research LifeCycleEncompassing experimentation, analysis, publication, research, learning Certified Experimental Results & Analyses Data, Metadata & Ontologies Source: Adapted from Liz Lyon, eBank UK Presentation
R: Structure determines function Sequence Structure Function Unfolded protein is chain of amino acids Function depends on protein shape Folded protein • Precise shape • Stable • Highly ordered • Active • Specific associations • Precise reactions • Highly mobile • Inactive
Fourier synthesis Electron density Phases R: How to solve a structure + Experimental methods = back to lab Use known structures (molecular replacement) Diffraction intensities 3D structure
R: Access Statistics: 23/8/2007 to 1/12/2007 • Views: 918 total • 257 from library staff • 152 from other Monash addresses • 509 from non-Monash addresses • Downloads: 498 total • 87 from library staff • 62 from other Monash addresses • 349 from non-Monash addresses
R: Why he cares about data • Raw data are sacred • Data validation for reviewers and by peers • His data are now safe and secure • Store of examples for those doing methods development • Some data cannot be processed by him; why not let others have a go?
C is for Challenges and Solutions • Laboratory data management practice • Institutional data management planning • Sustainable storage provision • Data curation across data stores • Data in institutional repositories
C: Laboratory data management practice • Challenge • Infrequent and deficient backup • No commitment to long-term preservation • Poor recording of metadata (descriptive/provenance) • Solution • Embed IM professionals with research teams • Provide sustainable storage for backup • Improve laboratory data capture systems
C: Institutional data management planning • Challenge • No systematic organisation-wide approach • No way of engaging with researchers
S: Institutional forum to discuss issues • Membership • Library • ITS • Records and Archives • Research Office • e-Research Centre • Outputs • Policy and Plan (print trial, web production) • Outreach activities
S: Data Management Plan – objectives • Assists both researcher and institution • Is completed at beginning of research project, updated as necessary • May become mandatory in future • Captures some technical, access and descriptive metadata at the beginning of research project • Is not onerous • Delivers visible benefits • Assists in providing complete research data solutions
S: Data Management Plan – components • Originators and owners of the data • Description of project • Metadata used (schema, standards) • Types of data to be collected • Volume of data (initial estimate) • Retention requirements (guidelines provided) • Format/s of and software used in creation and use of the data • Access policies and provisions • IP constraints • Confidentiality requirements • Storage, preservation and archiving of data
C: Sustainable storage provision • Challenge • Need sustainable way to provide large (terabyte) amounts of storage for researchers • Make this more financially attractive than JBOD under desk • Solution • Large Research Data Storage (LaRDS)
C: LaRDS requirements • Addresses institutional and researcher needs • Formulates a set of principles to guide cost modelling and sustainable funding options • Assumes commitment to storage in perpetuity • or “as long as required”, whichever comes first ;-) • Adopts a central storage model … • Centrally funded basic allowance, plus • Directly charged excess allowance • … in parallel with decentralised storage • 700 TB and growing
C: Data in institutional repositories • Challenge • Most IRs are designed for document objects • Many data objects are large • 2QP2 produced 36GB of image data • HTTP download metaphor doesn’t scale • Solution • Trialling both managed content and externally referenced content at present • Investigating custom disseminators on server
A: Australian e-Research Infrastructure • Term ≈ Cyberinfrastructure • National Collaborative Research Infrastructure Strategy (A$555M, 5 yrs) • 15 research capabilities • and Platforms for Collaboration • Platforms for Collaboration (A$75M, 4.5 yrs) • National Computation Infrastructure • Interoperation and Collaboration Infrastructure • Australian National Data Services
A: Australian National Data Service • Monash University is leading a project to establish ANDS • ANU and CSIRO to be other members of collaborative partnership • Tasks to be distributed more widely • Four platforms: • Frameworks (policy) • Utilities • Repositories • Researcher Practice • http://www.pfc.org.au/twiki/bin/view/Main/Data
Q is for Questions! • andrew.treloar@its.monash.edu.au • cathrine.harboe-ree@lib.monash.edu.au • alan.mcmeekin@its.monash.edu.au • http://arrow.edu.au/ • http://dart.edu.au/ • http://archer.edu.au/ * Thanks to Dr Ashley Buckle and colleagues at Monash for the use of the protein crystallography slides and movies
Federating Data • The Australian Repository for Diffraction ImageS • http://www.tardis.edu.au/ • National activity to support communities of protein crystallographers • Ideal place to hook into the eCrystals Federation • http://wiki.ecrystals.chem.soton.ac.uk/