1 / 16

Coping with Semantics in XML Document Management

Coping with Semantics in XML Document Management. Thomas Kudrass Leipzig University of Applied Sciences Department of Computer Science and Mathematics. Overview. Introduction Motivation XML: A Semantic Perspective XML Document Types XML Semantic Problems XML: A Database Perspective

alden
Download Presentation

Coping with Semantics in XML Document Management

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Coping with Semantics in XML Document Management Thomas Kudrass Leipzig University of Applied Sciences Department of Computer Science and Mathematics

  2. Overview • Introduction • Motivation • XML: A Semantic Perspective • XML Document Types • XML Semantic Problems • XML: A Database Perspective • Common Mapping Problems • RM-ODP Viewpoints on XML Documents • Content View vs. Logical Layout View • Example • Realization of XML Document Management: Nesting of Viewpoints • Conclusions

  3. Motivation • Aim: XML Document Management using Database Systems • Problem: Map XML Documents to Databases • different approaches • no mapping rules • many open issues • Reason: Semantics of XML not well understood • XML: only syntax, no predefined semantics • Introduction • XML Semantic Problems • Viewpoints on XML Documents • Realization • Conclusions

  4. XML - A Semantic Perspective • User-Defined Markup • structure the character data of a document • explain the documents through the use of names • Naming • RMD-ODP: “A name is a term that refers to an entity in a given naming context.“ • XML namespaces no solution • possible improvement: shared ontologies • No Standard Behavior of Tags • XSL processors: flexible presentation of XML document • XML processor: check well-formedness and validity of the XML document • open issue: document object semantics • Introduction • XML Semantic Problems • Viewpoints on XML Documents • Realization • Conclusions

  5. XML Document Types • Data-Centric Documents • designed for machine consumption (XML for data transport) • examples: sales orders, stock quotes, flight schedules • fairly regular structure • fine-grained data • Document-Centric Documents • designed for human readers • examples: books, journal articles, emails • less regular structure • coarse-grained data • Hybrid Documents • composition of documents of different types • example: medical documents = patient data + findings + prescriptions + procedures • Document Type  Requirements to the Document Management System • Introduction • XML Semantic Problems • Viewpoints on XML Documents • Realization • Conclusions

  6. XML - A Database Perspective • Round-Trip Problem • store an XML document in a database and retrieve the “same“ document back again • vital to applications required by law to keep exact copies of documents • less important to data-centric documents • focus on the document content • ignore the order of sibling elements • many XML-to-DB algorithms don‘t preserve the whole documents • CDATA sections • character entities • comments • processing instructions • Introduction • XML Semantic Problems • Viewpoints on XML Documents • Realization • Conclusions

  7. Common Mapping Problems (1) • Attributes vs. Element Text • where to store data of a document? • both alternatives possible, influenced by the implementation • Meaning of Attributes • ambiguities when interpreting attributes • example: order of a customer has an attribute expiry date = “11/2001“  different meanings: • “The order will expire in Nov. 2001“ • “The information about the order can be thrown away in Nov. 2001“ • “The expiry date is an information about the credit card used for purchase“ • Introduction • XML Semantic Problems • Viewpoints on XML Documents • Realization • Conclusions

  8. Common Mapping Problems (2) • Null Values • different semantics of null values • database null values have to be reflected in XML documents • XML Schema: • null values in element‘s text can be expressed • no concept of null for attributes • DTD: optional elements and attributes • Comments, Processing Instructions • considered no content of the document in many algorithms • Markup • visible in the logical document layout (e.g., character entities) • substituted in the physical representation of the document • Example: • &lt;foo/&gt stored in a database • non-XML aware database don‘t recognize markup <foo/> • Introduction • XML Semantic Problems • Viewpoints on XML Documents • Realization • Conclusions

  9. Common Mapping Problems (3) • Links • links originally designed for documents and document fragmentse.g., XPointers point to document subtrees using XPath • not adequate to express semantic relationships among document elements • e.g., ID: identifier value - primary key IDREF - foreign key Behavioral Semantics? • another language more appropriate to specify the invariants • Sibling Orders • particularly important for document-centric documents • can be arbitrary in data-centric documents • Other Invariants (e.g., identity constraints) • specified on the level of instances - not schema • construct the set of all concerned objects (using XPath) before • Introduction • XML Semantic Problems • Viewpoints on XML Documents • Realization • Conclusions

  10. RM-ODP Viewpoints on Documents • Physical Presentation View • dependent on media, screen size / paper size • document = composition of characters with attributes (font, size, style) • XML character entities replaced • Logical Layout View • composition of prose components (paragraphs, sections, lists, list items) and other objects (e.g., frames, code sections) • mostly ordered composition in document-centric documents • many possible physical presentation views • Content View • composition of information objects (title, author, abstract, body, bibliography) • can be organized in a hierarchical structure or can be flat • mapped to several logical layouts • Introduction • XML Semantic Problems • Viewpoints on XML Documents • Realization • Conclusions

  11. Content View vs. Logical Layout • Content View • document-centric documents • information viewpoint in DTD or XML Schema • some constructs to specify structural constraints (e.g., cardinality constraints in XML Schema) • data-centric documents • structure not very relevant • many invariants among content elements cannot be adequately expressed in DTD / XML Schema • possible abuse of XLink / XPointers to specify relationships among content elements • Logical Layout • document-centric documents • may follow the structure of the content • data-centric documents • often arbitrary • Introduction • XML Semantic Problems • Viewpoints on XML Documents • Realization • Conclusions

  12. Example: Integrity Constraints: The overall value of an order must exceed a certain minimum. A customer can submit at most 5 orders. If a customer is deleted, all of his orders have to be cancelled. C D Data-Centric Documents: Content View Customer Order • Introduction • XML Semantic Problems • Viewpoints on XML Documents • Realization • Conclusions (1,1) (1,N) Header Line Item • How to Map to an XML Document ? OR • How to Map to the Logical Layout View? Rel Product

  13. Alternative 1 <Customer> C1... <Order> O1 ... <Item> ... </Item> <Item> ... </Item> ... </Order> <Order> O2 ... <Item> ... </Item> <Item> ... </Item> ... </Order> ... </Customer> <Customer> C2... <Order> O3 ... <Item> ... </Item> <Item> ... </Item> ... </Order> <Order> O4 ... <Item> ... </Item> <Item> ... </Item> ... </Order> ... Data-Centric Documents: Logical Layout View • Alternative 2 • <Order> O1... • <Customer> C1... • <Item> ... </Item> • <Item> ... </Item> • ... • </Order> • <Order> O2 ... • <Customer> C1... • <Item> ... </Item> • <Item> ... </Item> • ... • </Order> • <Order> O3 ... • <Customer> C2... • <Item> ... </Item> • <Item> ... </Item> • ... • </Order> • <Order> O4 ... • <Customer> C2... • <Item> ... </Item> • <Item> ... </Item> • ... • </Order> • ... • Alternative 3 • <Item> ... • <Order> O1 ... • <Customer> • C1... • </Customer> • <Order> • </Item> • <Item> ... • <Order> O2 ... • <Customer> • C1... • </Customer> • <Order> • </Item> • <Item> ... • <Order> O3 ... • <Customer> • C2... • </Customer> • <Order> • </Item> • ... • Introduction • XML Semantic Problems • Viewpoints on XML Documents • Realization • Conclusions

  14. Operations • Operations are viewpoint-specific • XML-APIs: DOM / XPath • based on a tree model • although powerful, not appropriate for set-oriented operations • Viewpoints vs. Operations • content view: set-oriented operations • logical layout view: navigating operations (on a tree) • Need another language to express operations in the content view of data-centric documents! • Introduction • XML Semantic Problems • Viewpoints on XML Documents • Realization • Conclusions

  15. Realization: Nesting of Viewpoints iTE Presentation View B “Browse“ “Store“ SVG PDF Media: Screen, Paper ENG. TECHNOLOGY ENTERPRISE INFORMATION COMPUT. XML Document Logical Layout View B “Store“ “Retrieve“ XML Schema DTD File (Template) Large Object • Introduction • XML Semantic Problems • Viewpoints on XML Documents • Realization • Conclusions ENG. TECHNOLOGY ENTERPRISE INFORMATION COMPUT. Content View B “Store“ “Retrieve“ Semantic Model RDBMS native XML-DB ENG. TECHNOLOGY ENTERPRISE INFORMATION COMPUT.

  16. Conclusions • Analyze the requirements first before building an XML system • data-centric vs. document-centric documents • huge impact on the choice of technology (storage platform) • Think in viewpoints to understand the semantics • mixed occurrence of content view and logical layout in XML documents • expand viewpoints into the specification of a new system • Use generic relationships for constraint modelling • Beware of the difference between specification and realization • Introduction • XML Semantic Problems • Viewpoints on XML Documents • Realization • Conclusions

More Related