120 likes | 198 Views
Minimal Metadata for Data Services Through DIALOGUE. Neil Chue Hong AHM2007. It’s Good To Talk. Data Integration Applications: Linking Organisations to Gain Understanding and Experience (DIALOGUE) EPSRC supported sister project network grant running from 2005-2007
E N D
Minimal Metadata for Data Services Through DIALOGUE Neil Chue Hong AHM2007
It’s Good To Talk • Data Integration Applications: Linking Organisations to Gain Understanding and Experience (DIALOGUE) • EPSRC supported sister project network grant running from 2005-2007 • Stimulate discussion between people involved in data access and integration • http://www.datagrids.org
Minimum Requirements for Information Exchange • Requirements for agreements so that information can be effectively interchanged between DAI technologies. • Identification of data sources • Description of data sources • Identification of data • Description of data representations • What’s the least information I need?
Service Types • Core Services / Baseline Services • essential to infrastructure • security, registry, index, discovery • Data Services / Data Access Services • exposes a queryable data resource • Analytical Services / Data Processing Services / Computational Services • provide operations that act on data • Data Transfer Services • provide transfer of data between endpoints • Data Storage Services • provide management of data, inc replication
Interoperability Points between Components • Compatible naming for services • Compatible naming for data objects • Managed data transfer between any two endpoints • Data formats • Data discovery
Searching for data • Standard way of accessing a metadata catalogue • Standard format for describing a data resource • sufficient to access it • access protocols supported • description of security policies • transfer protocols supported • available replicas • QoS policies • sufficient to understand what’s in it • quality of data • provenance • sufficient to choose the right source • quality of service • productivity, availability, responsiveness, reliability, accessibility
Service name Service ID? Does this evolve too quickly? Is this only useful for resolution? Should this be described elsewhere? Service version Service owner Service maintainer Service description: human readable summary Service types implemented Minimal set of annotation on operations to allow discovery Minimal set of management information? Link to service policies (including security, QoS)? Should we use WSDL and WS-MetadataExchange Core Set of Metadata for all our Data Services
Extra Metadata for Data Access Services • Access protocols supported • query languages supported • result representations supported • Description of security policies • Transfer protocols supported • Available replicas • QoS policies • Is this DAIS? • Extensions for each type: SQL Data Access Services, DICOM, XPath, CQL, etc.
Extra Metadata for Data Transformation Services • Schema mapping • store schema maps (e.g. A->B) rather than schema • but no current agreed way of representing a schema map • allow schema maps to be discovered • allow optimisation over maps e.g. A->B->C => A->C if present • Quality and trust of third party schema maps
Core set of Metadata for representing a data object / record • Identifier • Structure / representational format • Provenance data • Human readable description • Extended sets for • relational data • XML data • file collections • Feature based processing, content based processing
Core set of information to store in a provenance record for data integration • Who created • When created • Which service created • Service configuration parameters (inc inputs) • But how do you evolve this information?
What needs to be agreed • What does need to be agreed to make different services interoperate? • DIALOGUE allowed discussion amongst software engineers involved in data integration • paper to be published • But look at Dublin Core… • … and the many formats for addresses