160 likes | 327 Views
DIALOGUE. Dialogue DataGrid. Relational databases, files, XML databases, object stores Strongly typed Multi-tiered metadata management system Incorporates elements from OGSA-DAI, Mobius, caGrid, STORM, DataCutter, GT4 … Scales to very large data, high end platforms. Requirements.
E N D
Dialogue DataGrid • Relational databases, files, XML databases, object stores • Strongly typed • Multi-tiered metadata management system • Incorporates elements from OGSA-DAI, Mobius, caGrid, STORM, DataCutter, GT4 … • Scales to very large data, high end platforms
Requirements • Support or interoperate with caGrid, eScience infrastructure • Interoperate with or replace SRB • Well defined relationship to Globus Alliance • Services to support high end large scale data applications • Design should include semantic metadata management • Well thought out relationship to commercial products (e.g. Information Integrator, Oracle)
Topics • Overview of products • Identify standard interfaces, where things could be done the same • Ship results from one product to another • Interface for a data service (DAIS mapping etc) • What scenarios do we have? • What patterns of queries will be run on a federated system • How do we make all our products look like they come from a single vendor? • All are fairly non-overlapping • Should client APIs look similar, as well as service interfaces? • Should security be similar, handled in a similar way • Other products, how do they fit • Root (from CERN) for data access (object oriented data analysis framework) • II, SRB, GridFTP, RFT
Distributed querying • Is this part of DIALOGUE? • Multi-product data integration • E.g. join across mobius and ogsa-dai • What’s missing from DAIS specs to let us do this • Output format • Standard interfaces for e.g. query planners, plugins,… • Locating relevant data • How to data discovery • Common user tool at top level to aid installation, configuration and service development • E.g. “Introduce” suite for deploying strongly typed GT4 services
Metadata • Minimal amounts of metadata for • Discovery / Registries • Integration • Querying • Performance • Hindering uptake • Is web services / SOAP / XML the right technology for data integration? • Branding • How to do distributions outside of just the academic area • What’s the model for contributing to something like this • Awareness of componets • Common vision • Understanding how to cross-sell projects
Multi-project data integration • Collaboration difficulties • Engaging the right people • Choosing the right process • Quality assurance, brand assurance • Agreeing a model of QA together • Integration and ease of use/install • What products components are generic, which are application/distribution specific? • Getting more effort • Target joint funding
Strongly typed, strongly classified data • Is this necessary for data integration? • caBIG approach is probably not generic enough • Need programmatic interoperability, rather than user instructed • Too much burden on data provider? • What’s the minimum barrier to entry? • Generic components and tools • Which could be leveraged between products • Are they other projects which use our products together? • If not, why not • Are the problem spaces too far apart? • They are both generic, but they have different focus • Not contradictory, but not aligned • Does it make sense to develop a tightly integrated set of products? • Possibly, if funding allows: the DIALOGUE software
Common Vision • Standard interfaces and where DAIS is not enough • Naming • User tools that help using products together • Metadata • Binding metadata and data, internally and externally • Collaboration
Common Vision • Either: • Plug and play world where components fit together, but no restrictions on what sets • A single generic data service powerful enough to satisfy all applications • Combinations of tightly integrated components which satisfy a targeted application area • DIALOGUE should produce a convincing demonstration of how things should work • A portfolio of how our the bits work, what needs to be changed, translated, etc. • Which could later be made robust
Standard Interfaces: Is DAIS enough? • Data exploration tools, administration of sets of data resources, discovery of data resources • Can we do all of this with data integration tech on top of DAIS interfaces • Does DAIS give us the minimal set of metadata we require • Don’t want to force a particular representation • But all, say XML operations should compose well • Also need to define transfer operators between representational models (structured binary, semi-structured textual, XML, relational, objects (tbd)) • Is RDF different, or a special case of XML? • Do you need to force a set of formats for each representation • Assume small set to allow proof of concept • A standard way of specifying • Query languages • Representational models • Representational formats • Transfer mechanisms • Endpoints • A way of binding data constraints and rules to data
An aside • If data contains details of how it can be represented as a service, plus rules and constraints on it • How do constraints change as you do operations • E.g. what happens when you derive, copy data • Operations which change the rules
Friday Breakout groups • Lunchtime • Stocktaking of components / Collaborative work / low hanging fruit (Ally, Steve, Peter, Lucas) - Cramond • Metadata (Jessie, Mario, Scott, Alex, Leena, Larry, Elias) - Newhaven • Movement of data between components (Kostas, Neil, Umit, Shannon, Ivan) - Breakout • Beyond DIALOGUE (Joel, Malcolm, Peter) - Dean • Afternoon • Metadata • Collaborative architecture • Wrapup • Organising next meeting • Unassigned • Mapping of components to scenarios • Schema federation / integration collaboration • Data Warehousing needs • Interface to bulk data / metadata
Actions • Share commonalities between toolkits • White paper on choke points common to models (editor Shannon) e.g. • Common Data Model - Representing models (HDM and GME) • Schema Mappings (?+IQL and Java->XPath?) • Query translation • What’s gained and lost by each combination / layering of components • Expressed as use case, maybe tied to application scenarios (publish on our web sites, Ally) • Cross-referencing of these between sites (each group choose the 5 or so papers which describe them) • Later expand to include “external” components • Define a glossary of agreed terminology (editor Neil) • E.g. data model, data integration, global schema • Informational document in DAIS/OGSA Data • Are there common things needed from “the Grid” • Common schema format representation (across our projects) from data access services (Amy) e.g. xsd for xml, cim for relational • Component linkups • Explore integration of OGSA-DAI and DataCutter for image processing (Edinburgh MSc project?) • STORM and OGSA-DAI, with MRC Human Genetics Unit application (Edinburgh Summer Intern?) • Send across grad students from Ohio
Actions • Metadata • What added functionality would you get if you added semantics to the registry as opposed to an external ontology? • Describe how to insert semantic annotations into the OGSA-DAI data resource configuration (Larry) • Can you uniformly present histograms and data required for optimisation (Alex) • Compare against Susan Malaika’s set of statistics • Send reference to survey of scalability techniques for reasoning with ontologies (Alex) • Produce strawman documents for a set of metadata required for access; optimisation; discovery and integration to be provided by a data service (Mario to ask for examples) • How can we maintain metadata for access (asked by Dave Berry) • Proposals for future projects • Send notes of discussions to participants and subscribe them to mailing list • Put it up on datagrids.org site
DIALOGUE 3/4 • Venue: near GGF, Washington DC • Date: 15 -16 September 2006 • Focus: • Proposal generation • Update on documents • Venue: Vienna • Date: 28 – 30 March 2007 (PB to confirm) • Focus: • Small group discussion and document production • Finish off deliverables