1 / 32

Structured -Document Processing Languages Spring 2007

Structured -Document Processing Languages Spring 2007. Course Review. Repetitio mater studiorum est!. Goals of the Course. Learn about central models and languages for manipulating representing transforming and querying structured documents (or XML)

cathal
Download Presentation

Structured -Document Processing Languages Spring 2007

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Structured-Document Processing Languages Spring 2007 Course Review Repetitio mater studiorum est!

  2. Goals of the Course • Learn about central models and languages for • manipulating • representing • transforming and • querying structured documents (or XML) • "Generic XML processing technology" Course Review

  3. Methodological Goals • Central professional skills • consulting technical specifications • experimenting with SW implementations • Ability to think…? • to find out relationships • to apply knowledge in new situations • ("Pidgin English" for scientific communication) Course Review

  4. XML? • Extensible Markup Language is not a markup language! • does not fix a tag set nor its semantics (like markup languages like HTML do) • XML is • A way to use markup to represent information • A metalanguage • supports definition of specific markup languages through XML DTDs or Schemas • E.g. XHTML a reformulation of HTML using XML Course Review

  5. <E A=‘1’> </E> </W> </S> XML Encoding of Structure: Example <S> S W E W A=1 world! Hello <W> Hello </W> <W> world! Course Review

  6. Basics of XML DTDs • A Document Type Declaration provides a grammar (document type definition, DTD) for a class of documents • Syntax (in the prolog of document instance):<!DOCTYPE rootElemType SYSTEM "ex.dtd" <!-- "external subset" in file ex.dtd --> [ <!-- "internal subset" may come here --> ]> • DTD = union of the external and internal subset Course Review

  7. How do Declarations Look Like? <!ELEMENT invoice (client, item+)> <!ATTLIST invoice num NMTOKEN #REQUIRED> <!ELEMENT client (name, email?)> <!ATTLIST client num NMTOKEN #REQUIRED> <!ELEMENT name (#PCDATA)> <!ELEMENT email (#PCDATA)> <!ELEMENT item (#PCDATA)> <!ATTLIST item price NMTOKEN #REQUIRED unit (FIM | EUR) ”EUR” > Course Review

  8. Element type declarations • The general form is<!ELEMENT elementTypeName (E)>where E is a content model • regular expression of element names • Content model operators: E | F : alternation E, F: concatenation E? : optional E* : zero or more E+ : one or more (E) : grouping Course Review

  9. XML Schema Definition Language • XML syntax • schema documents easier to manipulate by programs (than the DTD syntax) • Compatibility with namespaces • can validate documents using declarations from multiple sources • Content datatypes • 44 built-in datatypes (including primitive Java datatypes, datatypes of SQL, and XML attribute types) • + user-defined datatypes Course Review

  10. XML Namespaces <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"xmlns="http://www.w3.org/TR/xhtml1/strict"><!-- XHTML is the ’default namespace’ --><xsl:template match="doc/title"> <H1> <xsl:apply-templates /> </H1> </xsl:template> </xsl:stylesheet> Course Review

  11. 3. XML Processor APIs • How can applications manipulate structured documents? • Overview of document parser interfaces 3.1 SAX: an event-based interface 3.2 DOM: an object-based interface 3.3 JAXP: Java API for XML Processing Course Review

  12. "A",[i="1"] "Hi!" <?xml version='1.0'?> "A" A SAX-based application Application Main Routine Parse() startDocument() Callback Routines startElement() characters() endElement() <A i="1"> Hi! </A> Course Review

  13. DOM: What is it? • Object-based, language-neutral API for XML and HTML documents • Allows programs/scripts to • build • navigate and • modify documents • “Directly Obtainable in Memory” vs “Serial Access XML” Course Review

  14. <invoice form="00" type="estimated"> <addressdata> <name>John Doe</name> <address> <streetaddress>Pyynpolku 1 </streetaddress> <postoffice>70460 KUOPIO </postoffice> </address> </addressdata> ... DOM structure model form="00" type="estimated" invoice ... addressdata address name Document streetaddress postoffice John Doe Element Pyynpolku 1 70460 KUOPIO Text Course Review NamedNodeMap

  15. Overview of XSLT Transformation Course Review

  16. JAXP (Java API for XML Processing) • An interface for “plugging-in” and using XML processors in Java applications • includes packages • org.xml.sax: SAX 2.0 interface • org.w3c.dom: DOM Level 2 interface • javax.xml.parsers: initialization and use of parsers • javax.xml.transform: initialization and use of transformers (XSLT processors) • Included in standard Java Course Review

  17. .getXMLReader() .parse( ”f.xml”) JAXP: Using a SAX parser (1) .newSAXParser() XML f.xml Course Review

  18. .newDocument() .parse(”f.xml”) JAXP: Using a DOM parser (1) .newDocumentBuilder() f.xml Course Review

  19. .transform(.,.) JAXP: Using Transformers (1) .newTransformer(…) XSLT Course Review

  20. 4. Introduction to Style Sheets • Specify and produce visual representation for structured documents • by defining a mapping from document structure+content to formatting tasks, and • inserting/generating new text • numbering • rearranging • by rules based on contextual conditions Course Review

  21. Formatter input TeX Transformation FOT (XSL formatting object tree) Style sheet - Latex style file, CSS, XSLT Process of Transformation (muunnos) Course Review

  22. Process of Formatting (muotoilu) • Creates a detailed description of presentation • > style sheet may not have complete control of the final formatted presentation! Course Review

  23. Process of Rendering (hahmonnus) • Display/play the document on output medium Course Review

  24. CSS - Cascading Style Sheets • A stylesheet language • mainly to specify the representation of web pages by attaching style (fonts, colours, margins, …) to HTML/XML documents • Example style rule:H1 {color: blue; font-weight: bold;} Course Review

  25. CSS Processing Model (simplified) 0. Parse the document 1. Match style rules to elements of the doc tree • annotate each element with values assigned for properties • inheritance and elaborate "cascade" rules applied to select which value is assigned 2. Generate a formatting structure • of nested rectangular boxes 3. Render the formatting structure • display, print, audio-synthesize, ... Course Review

  26. XSL: Transformation & Formatting XSLT script I II Course Review

  27. Page regions • A simple page can contain 1-5 regions, specified by child elements of the simple-page-master Course Review

  28. contents of pages specify masters for page sequences, by referring to simple-page-masters Top-level formatting objects • Slightly simplified: fo:root fo:layout-master-set fo:page-sequence+ fo:flow (fo:simple-page-master | fo:page-sequence-master)+ fo:region-body fo:region-start? fo:region- end? fo:region-before? fo:region- after? Course Review

  29. XQuery in a Nutshell • Functional expression language • A query is a side-effect-free expression • Operates on sequences of items • atomic values or XML nodes • Strongly-typed: (XML Schema) types may be assigned to expressions statically, and results can be validated • Extends XPath 2.0(but not all axes required) • common for XQuery 1.0 and XPath 2.0: • Functions and Operators, W3C Rec. 01/2007 • Roughly: XQuery  XPath 2.0 + XSLT' + SQL' Course Review

  30. FLWOR ("flower") Expressions • for, let, where, order by and return clauses (~SQL select-from-where) • Form: (ForClause | LetClause)+ WhereClause? OrderByClause? "return" Expr • binds variables to values, and uses these bindings to construct a result (an ordered sequence of nodes) Course Review

  31. XQuery Example for $pn in distinct-values( doc(”sp.xml”)//pno) let $sp:=doc(”sp.xml”)//sp_tuple[pno=$pn] where count($sp) >= 3 order by $pn return <well_supplied_item> <pno>{$pn}</pno> <avgprice> {avg($sp/price)} </avgprice> <well_supplied_item> Course Review

  32. Course Main Message • XML is a universal way to represent information as tree-like data structures • Specialized and powerful technologies for processing it • Worst hype has settled • R&D still active Course Review

More Related