1 / 20

Structured -Document Processing Languages

Structured -Document Processing Languages. (5 cp), Spring 2011 Pekka Kilpeläinen University of Eastern Finland School of Computing Pekka.T.Kilpelainen@uef.fi. Introduction. First: Overview and Arrangements

tricia
Download Presentation

Structured -Document Processing Languages

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Structured-Document Processing Languages (5 cp), Spring 2011 PekkaKilpeläinen University of Eastern Finland School of Computing Pekka.T.Kilpelainen@uef.fi Notes 1: Introduction

  2. Introduction First: Overview and Arrangements What is this course about?1.1 Review of Structured-Documents 2.1 XML & XML Documents Notes 1: Introduction

  3. Goals of the Course • To get familiar with central models and languages for • Manipulating • transforming and • querying structured documents (or XML) • “Generic XML processing technology” • very little about specific XML applications, or commercial systems Notes 1: Introduction

  4. NOT an Exhaustive Survey • Bias in selecting course topics: • estimated usefulness/value • centrality (→ longer-lasting value) • maturity: Standards, or stable specifications? Robust implementations? • Lecturer up-to-date? • Emphasis on programmatic access and manipulation of data in the form of documents, rather than describing/modelling it Notes 1: Introduction

  5. “Programming for XML is very different from programming for other data models” - D. Florescu & D. Kossmann at SIGMOD’06 Motivation? • Interest in models of information processing • Practical relevance of documents and XML e.g. order XML e.g. invoice Internet / intranet Notes 1: Introduction

  6. Course Outline 1 IntroductionOverview and Arrangements1.1 Structured Documents 2 XML Basics - XML documents, DTDs, Namespaces 3 Programmatic Manipulation of Structured Documents (XML APIs)- SAX, DOM, StAX, JAXP Notes 1: Introduction

  7. Course Outline (2) 4 Transforming Structured Documents4.1 Addressing: XPath 1.04.2 XSLT 1.0 5 Querying Structured Documents- W3C XML Query Language XQuery 6 Review of the Course Notes 1: Introduction

  8. Methodological Goals • Central professional skills • consulting technical specifications • experimenting with SW implementations • Ability to think…? • to find out relationships, reason, explain, ... • to apply knowledge in new situations • ("Pidgin English" for scientific communication) Notes 1: Introduction

  9. Administration • An elective Master-level special course • suitable for all specialisation lines (esp. media technology and SWE) • 5 cp ( 133 h 20 min of work!) • Lectures March 9 – May 5, class MT3 • Video broadcast to TB180 in Joensuu • Lecturer: Pekka.T.Kilpelainen@uef.fi NB! Notes 1: Introduction

  10. Administration: Exercises • Exercises, March 14 – May 9 • essential for learning the technology • normal homework assignments, hands-on practice; Solutions discussed in class • Grading: one prepared solution -> one activity point • Plan: • some solutions shall be returned to Moodle, for potential feedback and scaling of activity points by solution quality • These will be announced on a case-by-case basis Notes 1: Introduction

  11. Administration: Grading • Final exam on Friday, May 13 • ≥ 50% of exam points required to pass • Grade = round(6*Exam/MaxExam + 2*HomeWork/MaxHomeWork - 2.5) • Retake exam on Friday June 10 • again  50% to pass • better of grades with/without homework credits Notes 1: Introduction

  12. Material • No single textbook • Reports, specifications, articles • Course home page • http://www.cs.uku.fi/~kilpelai/RDK11/ • slides, exercises, references, announcements • Possible background texts (See home page): • Deitel et al; Møller & Schwartzbach; Key; Walmsley Notes 1: Introduction

  13. Background Check • Basic knowledge of structured documents and document standards • Course ”Introduction to Document standards"? • Programming languages and concepts • Java? OO programming? • Functional programming? SQL? • Unix/Linux vs. Windows? • Formal language theory • Theory of Computation • regular expressions, context-free grammars, parse trees? Notes 1: Introduction

  14. Students' Expectations? Notes 1: Introduction

  15. 1.1. Structured Documents • Document: • a structured representation of information on some medium ( message) • normally for a human reader • memos, manuals, articles, books, … • also application-to-application messages • e.g., btw client and server in Web Services • "prose-oriented XML" vs "data-oriented XML" • can be treated as a unit • e.g., a web page vs a web site? Notes 1: Introduction

  16. Text-Based Documents • We concentrate on textual or text-based documents • character data major constituent of information content • as opposed to, say multimedia documents • Next: Presentation vs Structure Notes 1: Introduction

  17. Presentation vs Structure • Presentation informs the human readerabout the meaning of text and the role of its parts • Markup (merkkaus)indicates the presentation or the meaning of different parts of text • originally hand-written annotations for the typesetter • nowadays primarily codes embedded in digital documents; <Tags> Notes 1: Introduction

  18. Markup • Procedural markup • formatting commands (start boldface, produce an empty line, indent 5 mm, ...) • proprietary word processor formats, nroff, TeX, ... • Descriptive or generic markup • indicating the logical structure of text using chosen names • LaTeX: \begin{abstract} ... \end{abstract} • HTML: <TITLE> .... </TITLE> • Markup language (merkkauskieli) • a fixed set of markup notations (e.g. nroff, TeX, HTML, SVG, …) Notes 1: Introduction

  19. Structured Documents? Most liberally, any document is structured (punctuation, words, sentences, fields, …) but especially descriptively marked-up documents ... (e.g. well-formed XML) especially if they adhere to a rigorous specification of structure (e.g. XML+DTD) Notes 1: Introduction

  20. Structure in Documents • Hierarchy or nesting is ubiquitous • chapters of books, warnings in maintenance manuals, ... • Linear order essential in prose documents • less important in representations of data objects • Hypertext and cross-references • We'll be mainly dealing with manipulation of hierarchical, or tree-like document structures Next: How are these modelled? Notes 1: Introduction

More Related