Evolution of Content Management Interoperability Standards

How did we get here? (CMIS v0.5) F2F, January 2009

First, many thanks to allwho helped us get here • My fellow contributors from Alfresco, EMC, IBM, Microsoft, Open Text, Oracle, and SAP • All were • Constructive • Motivated to find a common ground Thank You!

Version 0.5 certainly has room for improvement. Nevertheless, there were a number of considerations that led to the current design.

The Interoperability Challenge for CMIS • Many enterprises already have a large amount of content accumulated in existing repositories. • Most can not afford to move their content to a new repository and re-implement their applications. • Existing investments should be leveraged/protected. • CMIS should provide interoperability for existing content as well as new content. • Interoperation with existing content is a big challenge. • A major objective for CMIS 1.0 is to find a common design that accommodates existing repositories. • A bootstrap to get CMIS off the ground

Interoperability for Existing Repositories Make CMIS easily and naturally mappable to most repositories • Layer-able on top of a repository’s native interface without behavioral change to the repository server • Without implementing replacement “server logic” on top of a repository • It should utilize a repository’s corresponding capability

Was Consumer interest neglected? • Interoperability for existing content is actually for Consumer’s benefit. • Broader Consumer interests are covered by use cases. • The difference is that Consumer interests are open-ended, and we need to be selective for 1.0. • Will CMIS be forever limited by legacy technology, and by the “least capable repository”? • No. Once the standard is widely adopted, its direction and scope will be driven by market dynamics and technology trend.Repository vendors will have to keep up.

Another Goal for 1.0: Keep it simple! • Successful standards often started simple • Learn from initial use and gradually evolve • Simplicity helps it get off the ground • Easier to reach agreement • Easier to understand, adopt, implement, and use • Less constraint for v2.0 (when we know better) • Intended approach: • Focus on basic, run-time functionsDefer non-essential functions • Get it out quickly

Object Types • Most repositories support typed documents, folders, and relationships, with fairly typical behaviors. • Their behaviors should be exposed for ease of use. • Many repositories do not support generic objects. • Originally, the model included a single, abstract root type. The other types were its subtypes. • This abstract root type was later dropped for simplicity, leaving multiple root types. • It can be re-introduced in the future if there is a need for it. • Policy object was added at a late stage.Its use has not been flushed out.

Object Identity • Object identity is assigned by repository. • Unique, permanent, location-independent • Opaque (e.g., it can be a “pathname”) • Object name is merely a property. • A repository may enforce implementation-specific constraint (such as uniqueness within a certain scope), and use it to construct “path”. • Objects may not have meaningful name in some use cases(e.g. production imaging, e-mail archiving) • Path is not supported. • Not a good way to identify an object • Problematic for multi-filing, unfiling, and move() • Potential security issues: “id” revealing existence of an object at a certain location; access a target object vs access every object on the path; etc.

Content Stream • Currently only 1 content-stream per document object is supported. • An application may create an object for each content-stream and use relationship to link it to a document. • To support multiple content-streams per object explicitly, the challenge is in the handling of content-stream metadata. • Increase complexity of object representation, versioning, and query • Also need additional CRUD methods

Query • Most repositories use RDBMS to handle query. • Need a schema-based query language, & flat schema • A subset of SQL standard is adopted, both syntax and semantics • Language extensions are isolated(by using separate production rules and terminals) • Reason for preserving SQL semantics and isolating extensions: • Leverage user’s SQL knowledge, avoid confusion • Safe to expand the query language to a larger subset of SQL in the future

Versioning • Most repositories support linear (chronological) versioning only. • Repositories differ in the way versions are linked. • So, version semantics is encapsulated in methods (current version, latest version, all versions).Explicit version navigation is not supported. • Private Working Copy (PWC) supports on-line (server-side) construction of a new version. • For repositories that support off-line (client-side) editing, PWC is not updatable and the entire new version is supplied at check-in.

Evolution of Content Management Interoperability Standards

Evolution of Content Management Interoperability Standards

Presentation Transcript