40 likes | 154 Views
Measurement Data Archive – Integration Effort http:// mda.doregistry.org / GEC11 July 2011 Giridhar Manepalli Corporation for National Research Initiatives http://www.cnri.reston.va.us /. Measurement Data Archive: Status.
E N D
Measurement Data Archive – Integration Effort http://mda.doregistry.org/ GEC11July 2011 Giridhar ManepalliCorporation for National Research Initiativeshttp://www.cnri.reston.va.us/
Measurement Data Archive: Status • Deployed a prototype of measurement data archive that includes a temporary storage space, aka workspace • A hierarchical storage system that allows making collections of objects • Mints a persistent identifier that resolves to data • Indexes metadata to support queries and data discovery • Supports SFTP, SCP, SMB, REST, and Web-based Interface into the system • Early adopters in GENI: • OnTimeMeasure - Ohio State University • INSTOOLS - University of Kentucky
Success Criteria for an Archive • Archive cannot be just a store-and-retrieve service. An eco-system surrounding the archive is needed to motivate communities into using it. • Visualization, policy enforcement, dissemination, etc. are examples of services an archive could provide. • To build such an eco-system, a basic understanding of what we store is necessary: • #1: Data Model. How do you define a data object? (Not how it is serialized, e.g., databases, file-systems, etc.). Do we need a data agnostic archive? Do we manage relationships across data objects? • Too many storage systems failed because of the lack of a proper data model. • #2: Metadata. What constitutes a metadata record? How is it associated with a data object? • Lack of metadata results in a pile of bytes in an archive. Building an eco-system of services with a pile of bytes is impossible. • #3: API. How is data (and metadata) pushed into an archive? What are the end-point definitions and data structures? • #1 and #2 are more important.
Integration: Next Steps • Step #1: Define a data object. • Is data just a series of bytes? Or do we pack X, Y, & Z into it? • Are relationships across objects required or not? (Not nice-to-have, but are they required?) • Do we have data visibility criteria? Permissions, etc. • Step #2: Validate metadata recommendation. • Projects should generate a few metadata records with these goals: • To identify which elements are needed, which are optional, and which are not required. • To capture different profiles of data. Perhaps some elements are needed for one class of data, and other elements are needed for other class of data. • This may result in a few profiles. Although unlimited profiles are hard to manage, a limited number will result in less optional fields. • To validate the suggested controlled vocabulary for some of the elements, and to identify vocabulary where missing. Controlled vocabulary brings some order into metadata and discovery. • Step #3: Identify API. • What end-points and data structures are reasonable for a given project? REST+XML, XML-RPC, etc.