100 likes | 221 Views
The Metadata Perspective. Peter Kunszt CERN GGF10 PNPA Workshop, Berlin. Overview. Metadata – what is it? An overview. Lessons learned Requirements Suggestions. Definition of Metadata. Metadata is data too! ~ Descriptive data
E N D
The Metadata Perspective Peter Kunszt CERN GGF10 PNPA Workshop, Berlin
Overview • Metadata – what is it? An overview. • Lessons learned • Requirements • Suggestions
Definition of Metadata Metadata is data too! ~ Descriptive data • Describe the data itself: what is the data about, parameters, characteristics, statistics, .. • Describe methods: algorithms, input/output parameters, .. • Describe middleware: service data, parameters, configuration, versions, owner, .. • Describe authentication and authorization: user lists, passwords, access control lists, tokens, .. • Describe modeling: UML diagrams, database schemata, .. • Describe history: provenance data, who has generated what data using what method, .. • Describe virtualization: virtual data generation parameters, pipelining • Describe operation: logging and monitoring, ..
Aspects of Metadata Bound to a context. • Semantics specific to the context. • Usage patterns specific to the semantics • Requirements specific to the context and semantics What is the context? • Application data – e.g. metadata on HEP events • Middleware specific – e.g. service description (storage, computing…) • Virtual Organization, resource provider – e.g. security policies • Logging and monitoring – e.g. LDAP, MDS, R-GMA, ..
Where is it? • Explicitly in the context (like the job description language, input output files, etc). Not the topic here, I talk about MD catalogs. • Dedicated catalog in application space. Examples • CMS RefDB • Atas Metdata Interface AMI • BaBaR Metadata Catalog • Dedicated catalog in middleware space • MCAT (virtual data catalog) • EDG Replica Metadata Catalog • VO management service • Metadata Grid Interface provisioning to existing catalogs • OGSA-DAI • Spitfire
How was it used to date? Lessons learned • Dedicated services like CMS RefDB work well. • Generic one-size-fits-all metadata catalogs are not used as much. (RMC) • Frameworks are hard to adopt and to use (Spitfire) • Lack of dedicated catalogs may lead to the abuse of monitoring and information services. • The boundary between application and middleware layer is blurred Conclusions • The narrower the context the better • Everyone doing their own metadata is good, BUT • Everyone defining a proprietary interface is bad • User controllable metadata is good
Requirements – ideas • Metadata catalogs must have a clear context • Differentiation between the grid middleware and application layer • Commonalities to be standardized on: • Common security mechanisms • Common exposure of interfaces (WSDL) • Common mechanism of describing the data content (like common methods to expose the schema) • Common query mechanisms • Common error reporting (SOAP Faults) • Catalogs should be able to call each other • Users should be able to store their own metadata (e.g. big success of SDSS SkyServer MyDB)
A Metadata Scenario • Virtual files / virtual collection concept (from HEPCAL) Query Interface needs standardization Metadata Catalog Virtual MD Query Result FileList File Catalog
Suggestions on how to proceed • Accept Web Service interfaces as the common base interface framework • Define interfaces inside application- and middleware-specific domains based on existing services and the specific needs of the given community. • Identify missing interfaces or required interfaces from clients and users • Compare the interfaces at a common forum (like this) • Define how to proceed: Factor out commonalities or standardize commonalities. The aim is to be interoperable. Propagate findings to groups in GGF wherever relevant, spawn new working group with a very specific focus! • Iterative process..
Conclusion • Metadata is closely tied to context and the semantics thereof. • Generic metadata services vs. specialized services: • Generic service to store key-value pairs might be useful to users to store their own data (exploit DAIS) • Try to use common mechanisms for security, discovery, query and error reporting. • Suggestion to work on specialized services, solving a well-understood problem of a user community. Identify commonalities as a second step (bottom up approach) • Maintain a good communication between metadata service providers – GGF can be the forum for this.