170 likes | 307 Views
Curation in the Cloud Hull’s Fedora and Hydra perspective Richard Green. Curation in the Cloud, London, 7/8 March 2012. Institutional repository background. Hull has been running a Fedora-based institutional repository for several years Originally based on Fedora + Muradora UI
E N D
Curation in the CloudHull’s Fedora and Hydra perspectiveRichard Green Curation in the Cloud, London, 7/8 March 2012
Institutional repository background • Hull has been running a Fedora-based institutional repository for several years • Originally based on Fedora + Muradora UI • More recently (6 months) based on Fedora + Hydra • The repository covers a wide range of content – not just OA articles… Curation in the Cloud | London | 7/8 March 2012 | 2
Wide range of content to deal with - Exam papers- e-Theses & dissertations (ETDs)- Journal articles - Meeting papers or minutes- Policies or procedures- Dissertations (undergraduate)- Photographs- Presentations- Books- Book chapters- Regulations- Reports - Conference papers or abstracts- Leaning materials- Handbooks - Internet publications- Newsletter articles- Datasets- Sound- Moving images- Guidance documents - Licences- Posters- Events- Letters - Artwork- Diagrams- Maps- Software - etc (!!!) Curation in the Cloud | London | 7/8 March 2012 | 4
Affiliations • Hull was instrumental in founding the Fedora UK & Ireland User Group… • 20 or so informal members Curation in the Cloud | London | 7/8 March 2012 | 5
Affiliations [2] • and is a founder member of the Hydra partnership (with the University of Virginia, Stanford University and Fedora Commons) • Fedora does not have an ‘out-of-the-box’ UI. Hydra set out to provide building blocks from which highly functional (full-CRUD) UIs could be built over it • Growing number of Hydra-using institutions in the US, two or three so far in the UK • Hydra “content modelling” is proving useful to non-Hydra Fedora users Curation in the Cloud | London | 7/8 March 2012 | 6
At the moment? • Just starting to think seriously about opportunities in the cloud • This meeting is opportune to help clarify what is still somewhat fuzzy thinking • At the moment, we in Hull are considering the use of cloud storage in addition to local storage for its Hydra repository Curation in the Cloud | London | 7/8 March 2012 | 7
At the moment? [2] • Why the cloud? • Could be used to provide near-line capability for rarely used assets which are individually ‘small’ but numerous • Potential to store very large, but rarely accessed, assets (TB range) ‘cheaply’ (cf high-performance SAN storage) • Possibility of leveraging ‘above campus’ services (Image manipulation? Video streaming? Format migration?) Curation in the Cloud | London | 7/8 March 2012 | 8
At the moment? [3] • WE’RE NOT • considering a complete repository infrastructure in the cloud • Happier with the software stack locally • considering local software with all-cloud storage • There are known problems with latency etc • WE ARE • considering a hybrid of the two Curation in the Cloud | London | 7/8 March 2012 | 9
At the moment? [4] • How? • In principle, Fedora (and therefore Hydra) allows for a mix and match of storage: Fedora managed (local file system), external (http accessible), redirected (redirects user to appropriate URL) • So: • use “managed content” for straightforward, small and/or high access materials; • use “external content” for low access materials or where there is a value-added service. Curation in the Cloud | London | 7/8 March 2012 | 10
Scale of problem • Bulk of repository content is “small” – megabytes • Multimedia content is larger (10s-100s megabytes) and our current offering is “download” – we cannot (yet) stream • We know there are multi-TB datasets on campus to be dealt with • eg Biology have one 6TB growing at 2TB per quarter Curation in the Cloud | London | 7/8 March 2012 | 11
Potential practical problems • High-access materials could generate large download charges • Better suited to low access objects or to get ‘value added’ services • Need a way of predicting costs over long periods (using the LIFE model?) • Getting large objects/volumes into the cloud • Transfer times for TBs of content are considerable. Use UPS to send a hard drive (or several?) Curation in the Cloud | London | 7/8 March 2012 | 12
Potential practical problems [2] • Security • Hull’s IR has very granular security (categories [public/staff/student], groups [eg student modules], individuals) • Need to be able to restrict access to cloud-based materials accordingly Curation in the Cloud | London | 7/8 March 2012 | 13
Potential practical problems [3] • Durability • “Designed to provide 99.999999999% durability” (Amazon S3 SLA). And the other 0.000000001%? Not a lot, but… • Could that mean for every terabyte you send us we promise not to corrupt more than ten or so bytes?!? • Or that we might lose 1 in 1011 files, which might not be quite so bad providing it’s not one of your files • LOCKSS type approach across several providers? Curation in the Cloud | London | 7/8 March 2012 | 14
Potential Practical Problems [4] • Management of an institutional cloud • Can an institution realistically manage its own cloud space(s)? • Managing just the data • Maybe managing cloud-based services • Is the idea of third-party management (à la DuraSpace) a more appropriate model? Curation in the Cloud | London | 7/8 March 2012 | 15
So, in summary… • Hull is potentially interested in cloud solutions for: • Low access materials which individually are not big but taken together are (eg 000s of images) • TB+, low-access objects • ‘Above campus’, value-added services (Image manipulation, media streaming, format migration, LOCKSS-in-the-Cloud?) • Maybe sounds like a job for a UK HE oriented, brokered service akin to DuraCloud’s model? Curation in the Cloud | London | 7/8 March 2012 | 16
Contacts and links IR Service owner: Chris Awre (c.awre@hull.ac.uk) Hydra Project Manager for Hull: Richard Green (r.green@hull.ac.uk) Hull Institutional Repository: hydra.hull.ac.uk Fedora website: fedora-commons.org Hydra website: projecthydra.org Fedora UK&I User group: fedora-uki.org.uk Curation in the Cloud | London | 7/8 March 2012 | 17