160 likes | 310 Views
Coping with Semantics in XML Document Management. Thomas Kudrass Leipzig University of Applied Sciences Department of Computer Science and Mathematics. Overview. Introduction Motivation XML: A Semantic Perspective XML Document Types XML Semantic Problems XML: A Database Perspective
E N D
Coping with Semantics in XML Document Management Thomas Kudrass Leipzig University of Applied Sciences Department of Computer Science and Mathematics
Overview • Introduction • Motivation • XML: A Semantic Perspective • XML Document Types • XML Semantic Problems • XML: A Database Perspective • Common Mapping Problems • RM-ODP Viewpoints on XML Documents • Content View vs. Logical Layout View • Example • Realization of XML Document Management: Nesting of Viewpoints • Conclusions
Motivation • Aim: XML Document Management using Database Systems • Problem: Map XML Documents to Databases • different approaches • no mapping rules • many open issues • Reason: Semantics of XML not well understood • XML: only syntax, no predefined semantics • Introduction • XML Semantic Problems • Viewpoints on XML Documents • Realization • Conclusions
XML - A Semantic Perspective • User-Defined Markup • structure the character data of a document • explain the documents through the use of names • Naming • RMD-ODP: “A name is a term that refers to an entity in a given naming context.“ • XML namespaces no solution • possible improvement: shared ontologies • No Standard Behavior of Tags • XSL processors: flexible presentation of XML document • XML processor: check well-formedness and validity of the XML document • open issue: document object semantics • Introduction • XML Semantic Problems • Viewpoints on XML Documents • Realization • Conclusions
XML Document Types • Data-Centric Documents • designed for machine consumption (XML for data transport) • examples: sales orders, stock quotes, flight schedules • fairly regular structure • fine-grained data • Document-Centric Documents • designed for human readers • examples: books, journal articles, emails • less regular structure • coarse-grained data • Hybrid Documents • composition of documents of different types • example: medical documents = patient data + findings + prescriptions + procedures • Document Type Requirements to the Document Management System • Introduction • XML Semantic Problems • Viewpoints on XML Documents • Realization • Conclusions
XML - A Database Perspective • Round-Trip Problem • store an XML document in a database and retrieve the “same“ document back again • vital to applications required by law to keep exact copies of documents • less important to data-centric documents • focus on the document content • ignore the order of sibling elements • many XML-to-DB algorithms don‘t preserve the whole documents • CDATA sections • character entities • comments • processing instructions • Introduction • XML Semantic Problems • Viewpoints on XML Documents • Realization • Conclusions
Common Mapping Problems (1) • Attributes vs. Element Text • where to store data of a document? • both alternatives possible, influenced by the implementation • Meaning of Attributes • ambiguities when interpreting attributes • example: order of a customer has an attribute expiry date = “11/2001“ different meanings: • “The order will expire in Nov. 2001“ • “The information about the order can be thrown away in Nov. 2001“ • “The expiry date is an information about the credit card used for purchase“ • Introduction • XML Semantic Problems • Viewpoints on XML Documents • Realization • Conclusions
Common Mapping Problems (2) • Null Values • different semantics of null values • database null values have to be reflected in XML documents • XML Schema: • null values in element‘s text can be expressed • no concept of null for attributes • DTD: optional elements and attributes • Comments, Processing Instructions • considered no content of the document in many algorithms • Markup • visible in the logical document layout (e.g., character entities) • substituted in the physical representation of the document • Example: • <foo/> stored in a database • non-XML aware database don‘t recognize markup <foo/> • Introduction • XML Semantic Problems • Viewpoints on XML Documents • Realization • Conclusions
Common Mapping Problems (3) • Links • links originally designed for documents and document fragmentse.g., XPointers point to document subtrees using XPath • not adequate to express semantic relationships among document elements • e.g., ID: identifier value - primary key IDREF - foreign key Behavioral Semantics? • another language more appropriate to specify the invariants • Sibling Orders • particularly important for document-centric documents • can be arbitrary in data-centric documents • Other Invariants (e.g., identity constraints) • specified on the level of instances - not schema • construct the set of all concerned objects (using XPath) before • Introduction • XML Semantic Problems • Viewpoints on XML Documents • Realization • Conclusions
RM-ODP Viewpoints on Documents • Physical Presentation View • dependent on media, screen size / paper size • document = composition of characters with attributes (font, size, style) • XML character entities replaced • Logical Layout View • composition of prose components (paragraphs, sections, lists, list items) and other objects (e.g., frames, code sections) • mostly ordered composition in document-centric documents • many possible physical presentation views • Content View • composition of information objects (title, author, abstract, body, bibliography) • can be organized in a hierarchical structure or can be flat • mapped to several logical layouts • Introduction • XML Semantic Problems • Viewpoints on XML Documents • Realization • Conclusions
Content View vs. Logical Layout • Content View • document-centric documents • information viewpoint in DTD or XML Schema • some constructs to specify structural constraints (e.g., cardinality constraints in XML Schema) • data-centric documents • structure not very relevant • many invariants among content elements cannot be adequately expressed in DTD / XML Schema • possible abuse of XLink / XPointers to specify relationships among content elements • Logical Layout • document-centric documents • may follow the structure of the content • data-centric documents • often arbitrary • Introduction • XML Semantic Problems • Viewpoints on XML Documents • Realization • Conclusions
Example: Integrity Constraints: The overall value of an order must exceed a certain minimum. A customer can submit at most 5 orders. If a customer is deleted, all of his orders have to be cancelled. C D Data-Centric Documents: Content View Customer Order • Introduction • XML Semantic Problems • Viewpoints on XML Documents • Realization • Conclusions (1,1) (1,N) Header Line Item • How to Map to an XML Document ? OR • How to Map to the Logical Layout View? Rel Product
Alternative 1 <Customer> C1... <Order> O1 ... <Item> ... </Item> <Item> ... </Item> ... </Order> <Order> O2 ... <Item> ... </Item> <Item> ... </Item> ... </Order> ... </Customer> <Customer> C2... <Order> O3 ... <Item> ... </Item> <Item> ... </Item> ... </Order> <Order> O4 ... <Item> ... </Item> <Item> ... </Item> ... </Order> ... Data-Centric Documents: Logical Layout View • Alternative 2 • <Order> O1... • <Customer> C1... • <Item> ... </Item> • <Item> ... </Item> • ... • </Order> • <Order> O2 ... • <Customer> C1... • <Item> ... </Item> • <Item> ... </Item> • ... • </Order> • <Order> O3 ... • <Customer> C2... • <Item> ... </Item> • <Item> ... </Item> • ... • </Order> • <Order> O4 ... • <Customer> C2... • <Item> ... </Item> • <Item> ... </Item> • ... • </Order> • ... • Alternative 3 • <Item> ... • <Order> O1 ... • <Customer> • C1... • </Customer> • <Order> • </Item> • <Item> ... • <Order> O2 ... • <Customer> • C1... • </Customer> • <Order> • </Item> • <Item> ... • <Order> O3 ... • <Customer> • C2... • </Customer> • <Order> • </Item> • ... • Introduction • XML Semantic Problems • Viewpoints on XML Documents • Realization • Conclusions
Operations • Operations are viewpoint-specific • XML-APIs: DOM / XPath • based on a tree model • although powerful, not appropriate for set-oriented operations • Viewpoints vs. Operations • content view: set-oriented operations • logical layout view: navigating operations (on a tree) • Need another language to express operations in the content view of data-centric documents! • Introduction • XML Semantic Problems • Viewpoints on XML Documents • Realization • Conclusions
Realization: Nesting of Viewpoints iTE Presentation View B “Browse“ “Store“ SVG PDF Media: Screen, Paper ENG. TECHNOLOGY ENTERPRISE INFORMATION COMPUT. XML Document Logical Layout View B “Store“ “Retrieve“ XML Schema DTD File (Template) Large Object • Introduction • XML Semantic Problems • Viewpoints on XML Documents • Realization • Conclusions ENG. TECHNOLOGY ENTERPRISE INFORMATION COMPUT. Content View B “Store“ “Retrieve“ Semantic Model RDBMS native XML-DB ENG. TECHNOLOGY ENTERPRISE INFORMATION COMPUT.
Conclusions • Analyze the requirements first before building an XML system • data-centric vs. document-centric documents • huge impact on the choice of technology (storage platform) • Think in viewpoints to understand the semantics • mixed occurrence of content view and logical layout in XML documents • expand viewpoints into the specification of a new system • Use generic relationships for constraint modelling • Beware of the difference between specification and realization • Introduction • XML Semantic Problems • Viewpoints on XML Documents • Realization • Conclusions