190 likes | 295 Views
An Extension to XML Schema for Structured Data Processing. Presented by: Jacky Ma Date: 10 April 2002. Presentation Outline. The Problems Research Objectives The Schema Extension: MMX MMX Query System Discussion Conclusion. The Problems. Mapping XML data into relational tables
E N D
An Extension to XML Schema for Structured Data Processing Presented by: Jacky Ma Date: 10 April 2002
Presentation Outline • The Problems • Research Objectives • The Schema Extension: MMX • MMX Query System • Discussion • Conclusion
The Problems • Mapping XML data into relational tables • Not natural to XML structure • Efficient, but may not be a effective method • Legacy application-specific structured data • Similar modeling but proprietary implementation • Not interoperable, and difficult to maintain • Lack of modular design and thus difficult to combine to form more complex data structure • Meta-data can facilitate wide range of needs, while XML Schema is solely used for physical data validation nowadays
Research Objectives • To facilitate more effective searching and storing of XML contents by making use of meta-data (XML Schema) • Propose a data-oriented model to allow different storage mechanism, processing model, and query model on XML contents
Our Approach – MMX • Use meta-data to map XML data into structured data objects • Define the structured data models “conceptually” and link the models to XML document structure “syntactically” • Meta-data is defined as an extension of XML Schema • The extension is called MMX (Multi Model XML)
Raw Data Structured Data (XML) Data with Modeling Information Data with Program Codes Program Driven vs. Data Driven Information for processing is hard-coded in program Program Driven MMX! Data Driven Processing instruction is hard-coded in data?!
Schema Extension • The extended schema is associated with a namespace • The extended schema goes within a schema element, like <tree:element> in the example • <tree:element> specify a single structure object instance • Name association for elements and attributes • Class hierarchies: • <tree:element> -> <tree:internal> -> <tree:leafNode> • finally to the structure specified in <tree:leafNodeValue> • Additional properties in <rootNodeAttr>, <internalNodeAttr> and <leafNodeAttr> • Schema writer has to know the structure model specification, while the XML writer only needs to know the given schema
Modeling • For an instance of “MMX data object” • As an encapsulated information object only accessible from the root, thus as a “single tree node” • As a mapping from root node, query method and query parameters to the value at leaf nodes • Leaf nodes may contain any valid XML content, as long as defined in the Schema • I.e. may contain another “MMX data object” • A query is modeled as a 3-dimension tuple: • [accessing-node, query-method, query-parameters] • Accessing-node is specified by XPath • Query-method is specified in String Value • Query-parameters is multi-dimension depends on the current model
Modeling (2) A Tree(1) is accessible frompoint A, occasionally, a query (e.g. [A, “spatial-search”,(3, 5)], assuming Tree(1) will accept spatial-search with two coordinates) may return point B as answer, either by XPath of B or the XML subtree of B. From this point B, user may drill down the tree by issueanother query on Tree(2). Tree (1) B Tree(2) XML Elements..
Query with and without MMX • From the original XML data, we could not assume the semantics of the data: • We can ONLY do XML-based query such as XPath • We can do the spatial query ONLY IF we can map the data into a R-Tree • After mapping the data into R-Tree • Spatial Queries • Give me the point at (2,7) • Give me the point nearest to (4,4) • Nearest Neighbor Search • Give me the point nearest to “Franklin”
Processing • Users might not know the “type” of the node (and not necessary to know). They are interested in what they can do • Users retrieved the list of possible operation by issuing a LIST-OPERATION method to the root element of a MMX object • Possible operations may include queries, updates, and other model-specific operations
MMX Query System • To show that the schema, modeling, and processing of MMX extension is workable • To illustrate how it assists in querying XML data • To facilitate as the platform for testing the implementation of arbitrary structured models • Implement with JDK1.4
System Design Clients XML DOM MMXDocument Node Data Schema MMX Element ParseSchema FetchClasses AbstractMMX Element The Abstract Class defines common interface that have to be implement in each MMX Element such as LIST-OPERATION, QUERY, BUILD, etc. Extends class (Partly)Defines … VP-Tree X-Tree R-TreeSchema Maps R-Tree
Discussions - Pros • Compatible with the relational approach, and supersedes that. • Modular design promotes reusability and maintainability • XML “flatten” the legacy structured data to make them text-editable, easy to transport and process by different systems
Discussion - Cons • There is no generic syntax to precisely describe all kinds of structures models • The size of XML file is often larger than legacy data file • Each structure model needs additional implementation effort • Schema specification become longer and longer quickly as number of supported model increases
Conclusion • Propose a representation to encapsulate data structures • Describe XML data with the Schema conceptually as well as syntactically • Map legacy structure models into Schema, and map XML data to the structure models by the Schema • Structured data repository with increased interoperability, reusability, and transportability