1 / 35

DuraSpace, Fedora and DuraCloud

DuraSpace, Fedora and DuraCloud. Triangle Research Libraries Network September, 2009. DuraSpace, Inc. Combined Fedora Commons, Inc. and DSpace Foundation 501-(c)3 private, non-profit company 4-year project funded by Moore Foundation to become self-sustaining Continuing software development

leola
Download Presentation

DuraSpace, Fedora and DuraCloud

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. DuraSpace, Fedora and DuraCloud Triangle Research Libraries Network September, 2009

  2. DuraSpace, Inc. • Combined Fedora Commons, Inc. and DSpace Foundation • 501-(c)3 private, non-profit company • 4-year project funded by Moore Foundation to become self-sustaining • Continuing software development • Moving towards community-based software development • Ensuring durability for a web in the clouds!

  3. The world we work in… Data Curation, Linking, Publishing Scholarly and Scientific Collections Preservation and Archiving Education, Knowledge Spaces blog and wiki and more …

  4. DuraSpace Products • Fedora • Dspace • Akubra – storage plug-in module with transactional file system • Mulgara – RDF indexing engine • Topaz – core semantic knowledgebase components • DuraCloud

  5. Solution Communities • Community group that creates and maintains the vision for software solutions in an area • Trying to create the conditions for collaboration • Gather resources to create software for solution • Coordinates development with DuraSpace technical staff

  6. Solution Areas • Data Curation • Open Access Publishing • Preservation and Archiving • Small Archives • Scholars’ Workbench

  7. The Flexible Extensible Digital Object Repository Architecture • A set of abstractions that can be used to represent different kinds of data • A repository management system • A foundation for many information management applications • Designed to make data “durable” over the long term

  8. 165 Current Known Users • Broadcasting and media – 1 • Consortia – 9 • Corporations – 14 • Government agencies – 8 • IT- Related Institutions – 10 • Medical Centers and Libraries – 4 • Museums and Cultural Organizations – 5 • National Libraries and Archives – 16 • Professional Societies – 2 • Publishing - 4 • Research Groups and Projects – 18 • Semantic and Virtual Library Projects - 6 • University Libraries and Archives - 68

  9. Making complex digital information “durable” is a very hard problem • The existence and meaning of content needs to be verifiable as technologies change • A history of the changes to the encoding and state of content must be reliably provided • A meaningful context for any unit of content may be one of many and must be sustained • Complex resources will increasingly be dispersed across institutional boundaries.

  10. The Fedora abstractions provide a durability framework. • Content is “unitized” as information objects that combine data, metadata, policies, relationships and the history of the object. • Complex digital resources are formally defined graphs of related objects. • The public view of the content is presented as abstract behaviors. • The web services orientation of Fedora provides the basis for repository federation.

  11. A data object is one unit of content Persistent ID DC RELS-EXT Reserved Datastreams AUDIT POLICY 1 2 Custom Datastreams (any type, any number) n

  12. PID Object Properties Client provides value System generates value Either way PID “namespace:name Object Type “Data or Service Definition or Service Deployment Created Date “2007-04-30T19:59:03.000Z” (UTC, ISO8601 format)‏ Last Modified Date “2007-04-30T19:59:03.000Z” (UTC, ISO8601 format)‏ State “A”, “I”, or “D” (Active, Inactive, Deleted)‏ Label “Any string” LEGEND Content Model “Any string” Owner ID “Any string” ‏

  13. Datastream Properties Client provides value System generates value Either way Datastream Datastream ID Any XML “NCName” unique within the object “X”, “M”, “E”, or “R” (Inline XML, Managed, Externally Referenced, or Redirected)‏ State Control Group “A”, “I”, or “D” (Active, Inactive, Deleted)‏ Versionable “true” or “false” LEGEND Version 1 or more

  14. Policies • Machine enforceable expressions of rules, what they are applied to and who they affect. • Who is affected can be defined in different authorization sources, such as LDAP services • Rules can be as simple as “allow” or “deny”. • Rules are applied to objects as a whole, any datastream, or a dissemination, as well as each API call and more.

  15. Relationships Among Objects • Describes adjacency relationships among objects, among units of content • RDF data of the form: PID – typeOfRelationship – relatedObjectPID • Can used to assemble aggregations of objects • Can build graphs of relationships to feed into user interfaces

  16. Optional Object Behaviors • Data objects can have different views or transformations • Sets of abstract behaviors that different kinds of objects can subscribe to • Corresponding sets of services that specific objects can execute • The business logic is hidden behind an abstraction

  17. Content Access Content Management

  18. Content Models • Create classes of data objects • Expressed as Cmodel objects • A Cmodel object defines the number and types of data streams for objects of that class • A Cmodel object binds to service objects to enable appropriate behaviors to be inherited by data objects

  19. Persistent ID (PID) Persistent ID (PID) Metadata System System Metadata Datastreams Datastreams Service Definition Metadata Persistent ID (PID) System Metadata Persistent ID (PID) Datastreams RDF data Service Binding Metadata (WSDL) System Metadata Datastreams Service Definition Object service subscription Cmodel Object service contract data contract Data Objects Web Service Service Mechanism Object

  20. A behavior call has the form: Object PID + SDef Name + Method Name • Other components include: • Parameter values used by the method • Datetime stamp for earlier version

  21. Objects Representing Aggregations • Creating parent objects for complex resources • Representing explicit collections • Representing implicit collections • Creating digital surrogates for physical entities

  22. Text Collections

  23. The Rossetti Archive

  24. A Research Project

  25. Fedora Framework Service Integration services listen and consumeevents or other messages GSearch Simple JMS OAI Fedora Repository Service Ingest repository publishes events More…

  26. DuraCloud Trusted management of and access to durable digital assets in the cloud DuraSpace Mediating Service Microsoft

  27. DuraCloud - basics • Replicate to multiple storage providers • Replicate to multiple geographic areas • Monitor and audit digital assets • Compute services in cloud next to content • Hosted by DuraSpace not-for-profit org • Partnerships with cloud providers • “Pay for use” for services and storage • Available to run internally- open source

  28. Use Cases:DuraCloud with Cloud Storage • Online backup for text, images, datasets, video, audio • Enable preservation via multiple copies, geographies, administrations • Elastic provisioning of temporary or permanent storage for projects or jobs

  29. Use Cases:DuraCloud with Cloud Compute • Streaming service for video • Hosting JPEG2000 image engine • Indexing and other processing heavy jobs • Repositories in cloud • Data and text mining over open data • Optional aggregation of open data

  30. Partners and Pilots • Selected initial cloud providers • Selected 2 initial pilot partners

  31. Pilot use cases • Ingest large quantity of material • Replicate to multiple cloud platforms • Manage replication and monitoring • Run services

  32. DuraCloud Timeline • Initial open source release– summer 2009 • Begin pilots – September 2009 • Pilot data loading and testing – Fall 2009 • Plug-ins for repository platforms – Q4 2009 • Beta for repository community - Q1 2010 • Pilot testing with compute services Q1 2010 • Report pilot results – Q1 2010 • Launch production service Q2 2010

  33. http://www.duraspace.org/ http://www.fedora-commons.org/

More Related