250 likes | 256 Views
Explore the VI-SEEM data repository, learn about the underlying technology, hardware implementation, benefits, features, types of data, and information model.
E N D
VI-SEEM Data Repository Vladimir Dimitov IICT-BAS acknwloedgements to Vladimir SlavnićIPB The VI-SEEM project initiative is co-funded by the European Commission under the H2020 Research Infrastructures contract no. 675121
Agenda • VI-SEEM Data Repository • Underlying Software Technology • Hardware Implementation • Benefits of VI-SEEM Repo • Features • Types of data • Information Model • The DSPACE (VI-SEEM Repo) Architecture • Repository Organization • Examples Total number of slides: 25 VI-SEEM Regional Climate Training event - Belgrade, Serbia, 11-13 Oct 20172
VI-SEEM Repository • The VI-SEEM Repository provides long term data preservation, suitable for data set sharing https://repo.vi-seem.eu/ • Use cases • To store curated data sets for long term preservation • To share those datasets with selected collaborators or open them up to whole communities, via web interface • To make such data sets searchable by means of associating meta data and then harvesting them • Enables scientific communities to capture and describe digital works using a custom submission workflow module VI-SEEM Regional Climate Training event - Belgrade, Serbia, 11-13 Oct 20173
Underlying Software Technology • Based on DSpace (http://dspace.org) • DSpace is a platform that allows you to capture items in any format – in text, video, audio, and data. It distributes it over the web. It indexes your work, so users can search and retrieve your items. It preserves your digital work over the long term. • Developed by the MIT Libraries with support from the HP-MIT Alliance • A platform to build an Institutional Repository VI-SEEM Regional Climate Training event - Belgrade, Serbia, 11-13 Oct 20174
Hardware Implementation • The VI-SEEM Repo is installed on a virtual machine with 8 GB RAM and 4 virtual cores. • The physical hosting server is an IBM 3650 M4, with 2 eight core CPUs and 128 GB RAM. • The storage array is formated with GPFS and it is connected over infiniband (56 Gbit/s), using IBM GSS and ESS storage servers. Failover issues are handled automatically by GPFS. • The storage capacity dedicated to the Repo is 50 TB. Currently around 16 TB are occupied with useful data. • Hosted and maintained by GRNET. VI-SEEM Regional Climate Training event - Belgrade, Serbia, 11-13 Oct 2017 5
Benefits of VI-SEEM Repo • Some example benefits: • Getting your research results out quickly, to a worldwide audience • Reaching a worldwide audience through exposure to search engines such as Google • Storing reusable teaching materials that you can use with course management systems • Archiving and distributing material you would currently put on your personal website • Storing examples of students’ projects (with the students’ permission) • Showcasing students’ theses (again with permission) • Keeping track of your own publications/bibliography • Having a persistent network identifier for your work, that never changes or breaks • No more page charges for images. You can point to your images’ persistent identifiers in your published articles. VI-SEEM Regional Climate Training event - Belgrade, Serbia, 11-13 Oct 20176
Features • User Interface • Web based, for submission, end-user and System Administrators • Search and retrieval of items by browsing or searching the metadata • Workflow • Enables differing submission workflows for communities • Models "e-people" who have "roles" in the workflow of a particular Community in the context of a given collection • Persistent Identifiers (Handles) • Implements CNRI handles as the persistent identifier associated with each item • Soon to be integrated with the VI-SEEM PID service • Access Control • Allows contributors to limit access to items in the repository, at both the collection and the individual item level • Integrated with the VI-SEEM Login Service • Metadata Schema • UtilisesQualified Dublin Core VI-SEEM Regional Climate Training event - Belgrade, Serbia, 11-13 Oct 20177
Types of data in the VI-SEEM Repository • Articles • Preprints, e-prints • Technical Reports • Working Papers • Conference Papers • E-theses • Audio/Video • Lecture notes, Visualizations, simulations • Datasets in various formats • Experimental • Simulation • Input • Output • Images • Visual, scientific • Teaching material • Digitized library collections VI-SEEM Regional Climate Training event - Belgrade, Serbia, 11-13 Oct 20178
Information Model • Communities • Departments, Labs, Research Centers, Schools… • Collections (in communities) • Distinct groupings of like items • Items (in collections) • Logical content objects • Receive persistent identifier • Bitstreams (in items) • Individual files • Receive preservation treatment • Versioning- Item “versions” can be • All instances of a work in different formats • E.g. the XML, PDF, and PostScript versions • All editions of a work over time • Metadata lists all available versions of items VI-SEEM Regional Climate Training event - Belgrade, Serbia, 11-13 Oct 20179
The DSpace (VI-SEEM Repo) Architecture VI-SEEM Regional Climate Training event - Belgrade, Serbia, 11-13 Oct 201710
Repository Organization • Each Dspaceservice is comprised of Communities – the highest level of the Dspace content hierarchy • Communities may be: • Departments • Labs • Research Centres • Schools • Each community contains descriptive metadata about itself and the collections contained within it VI-SEEM Regional Climate Training event - Belgrade, Serbia, 11-13 Oct 201711
Collections • Each community in turn have collections which contain items or files • Collectionscan belong to a single community or multiple communities (collaboration between communities may result in a shared collection) • As with communities, each collection contains descriptive metadata about itself and the items contained within it VI-SEEM Regional Climate Training event - Belgrade, Serbia, 11-13 Oct 201712
Example Structures • Structures may be based around organizational units: • Structures are hierarchical: VI-SEEM Regional Climate Training event - Belgrade, Serbia, 11-13 Oct 201713
Example: Home screen VI-SEEM Regional Climate Training event - Belgrade, Serbia, 11-13 Oct 201714
Example: Climate Sciences community VI-SEEM Regional Climate Training event - Belgrade, Serbia, 11-13 Oct 201715
Example: Browsing by title VI-SEEM Regional Climate Training event - Belgrade, Serbia, 11-13 Oct 201716
Example: Submissions and Workflow tasks VI-SEEM Regional Climate Training event - Belgrade, Serbia, 11-13 Oct 201717
Example: Item submission, first step VI-SEEM Regional Climate Training event - Belgrade, Serbia, 11-13 Oct 201718
Example: Item submission, description VI-SEEM Regional Climate Training event - Belgrade, Serbia, 11-13 Oct 201719
Example: Item submission, third step VI-SEEM Regional Climate Training event - Belgrade, Serbia, 11-13 Oct 201720
Example: File upload VI-SEEM Regional Climate Training event - Belgrade, Serbia, 11-13 Oct 201721
Example: Item review VI-SEEM Regional Climate Training event - Belgrade, Serbia, 11-13 Oct 2017 22
Example: Add Creative Commons (CC) license VI-SEEM Regional Climate Training event - Belgrade, Serbia, 11-13 Oct 201723
Example: Distribution license VI-SEEM Regional Climate Training event - Belgrade, Serbia, 11-13 Oct 201724
Conclusion • The VI-SEEM Repository is the main place for long term data preservation, suitable for dataset sharing. • The implementation is based otDSpacepopular open source technology for building large data repositories. • The VI-SEEM Repository is hosted on a high-performance infrastructure. • Types of data may include: • Measurements, Visualizations, Simulations, Audio/Video, Images, • Digitized library collections, Articles, Technical Reports, training materials, raw data etc. • The dataset items are described with detailed metadata records, which must be carefully and patiently filled in by the senders. • A carefully selected license must be assigned to each dataset item. • Thank you for your attention. Questions? VI-SEEM Regional Climate Training event - Belgrade, Serbia, 11-13 Oct 2017 25