1 / 63

XML Programming CSPP51038

XML Programming CSPP51038. Overview Simple Schema XML Sample Applications. Course specifics. Prerequisites. Since XML is about interoperability , I’m going to do my best to make the class language-agnostic. Being master of any one of the following is required Python Perl Java C C++

heather-cox
Download Presentation

XML Programming CSPP51038

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. XML ProgrammingCSPP51038 Overview Simple Schema XML Sample Applications

  2. Course specifics

  3. Prerequisites • Since XML is about interoperability, I’m going to do my best to make the class language-agnostic. Being master of any one of the following is required • Python • Perl • Java • C • C++ • Visual Basic • I will base class lectures/examples on Java

  4. Format • Five homeworks: 70% • In-class quizzes (8 total): 30% • No midterm or final • Class participation can help grade

  5. Getting help • Mandatory: Register for discussion group via course website • To post to list, send mail to • cspp51038-su-06-1@cs.uchicago.edu • TA info, office hours/locations all on web-site starting Wed. • http://people.cs.uchicago.edu/~asiegel/cspp51038 • Consult website frequently for updates/announcements, homework, readings, etc.

  6. Policies • Late homework allowed up to three days – 10% penalty charged automatically. • If you need any special considerations, please see me in advance or I won’t be able to help you • Will turn over homework in 7 days

  7. Programming models

  8. Distributed programming modelsTypical Web-based Easy to deploy but slow, not great user experience database html browser WebServer http • Many programming models • JSP • Servlets • PHP • CGI (python, perl, C) • Cold Fusion Dynamically Generated html html plus optionally JavaScript to jazz up html

  9. Distributed programming modelsTypical Web-based Better user experience. Heavier, less portable, requires socket programming to stream to server. html database WebServer http applet socket Dynamically Generated html html + applet

  10. ports App1 sockets Application client App2 App3 Direct Connections App1 Application client Remote Procedures NDS App2 App3 Examples: Java’s rmi, CORBA

  11. XML basics

  12. XML Basics, cont • Most modern languages have method of representing structured data. • Typical flow of events in application Read data (file, db, socket) Marshal objects Manipulate in program Unmarshal (file, db, socket) • Many language-specific technologies to reduce these steps: RMI, object • serialization in any language, CORBA (actually somewhat language neutral), • MPI, etc. • XML provides a very appealing alternative that hits the sweet spot for • many applications

  13. Fortran Java C type Student character(len=*) :: name character(len=*) :: ssn integer :: age real :: gpa end type Student class Student{ public String name; public String ssn; public int age; public float gpa; } struct Student{ char* name; char* ssn; int age; float gpa; } User-defined types in programming languages • XML is a text-based, programming-language-neutral way of representing structured information. Compare:

  14. Sample XML Schema • In XML, (a common) datatype description is called an XML schema. • DTD and Relax NG are other common alternatives • Below uses schema just for illustration purposes • <?xml version="1.0" encoding="UTF-8"?> • <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" • elementFormDefault="qualified" attributeFormDefault="unqualified"> • <xs:element name="student"> • <xs:complexType> • <xs:sequence> • <xs:element name="name" type="xs:string"/> • <xs:element name="ssn" type="xs:string"/> • <xs:element name="age" type="xs:integer"/> • <xs:element name="gpa" type="xs:decimal"/> • </xs:sequence> • </xs:complexType> • </xs:element> • </xs:schema> Ignore this For now

  15. Alternative schema • In this example studentType is defined separately rather than anonymously • <xs:schema> • <xs:element name="student" type="studentType“/> • <xs:complexType name="studentType"> • <xs:sequence> • <xs:element name="name" type="xs:string"/> • <xs:element name="ssn" type="xs:string"/> • <xs:element name="age" type="xs:integer"/> • <xs:element name="gpa" type="xs:decimal"/> • </xs:sequence> • </xs:complexType> • </xs:schema> new type defined separately

  16. Alternative: DTD • Can also use a DTD (Document Type Descriptor), but this is • much simpler than a schema but also much less powerful • (notice the lack of types) • <!DOCTYPE Student [ • <! – Each XML file is stored in a document whose name is the same as the root node -- > • <! ELEMENT Student (name,ssn,age,gpa)> • <! – Student has four attributes -- > • <!ELEMENT name (#PCDATA)> • <! – name is parsed character data -- > • <!ELEMENT ssn (#PCDATA)> • <!ELEMENT age (#PCDATA)> • <!ELEMENT gpa (#PCDATA)> • ]>

  17. Another alternative: Relax NG • Gaining in popularity • Can be very simple to write but also contain many more features than DTD • Still much less common than schema

  18. Creating instances of types In programming languages, we instantiate objects: struct Student s1, s2; s1.name = “Andrew” s1.ssn=“123-45-6789”; Student s = new Student(); s1.name = “Andrew”; s1.ssn=“123-45-6789”; . type(Student) :: s1 s1%name = ‘Andrew’ . C Java Fortran

  19. Creating XML documents • XML is not a programming language! • In XML we make a Student “object” in an xml file (Student.xml): <Student> <name>Andrew</name> <ssn>123-45-6789</ssn> <age>36</age> <gpa>2.0</gpa> </Student> • Think of this as like a serialized object.

  20. XML and Schema • Note that there are two parts to what we did • Defining the “structure” layout • Defining an “instance” of the structure • The first is done with an appropriate Schema or DTD. • The second is the XML part • Both can go in the same file, or an XML file can refer to an external Schema or DTD (typical) • From this point on we use only Schema

  21. Exercise • Create an XML file that contains a list of two Car “objects” (pick five relevant fields). • Note that each XML file must have a single root node, so each car element must be under a common parent (e.g. cars).

  22. Exercise Solution <?xml version="1.0" encoding="UTF-8"?> <cars> <car> <make>dodge</make> <model>ram</model> <color>red</color> <year>2004</year> <mileage>22000</mileage> </car> <car> <make>Ford</make> <model>Pinto</model> <color>white</color> <year>1980</year> <mileage>100000</mileage> </car> </cars>

  23. ? • Question: What can we do with such a file? • Write corresponding Schema to define its content • Write XSL transformation to display • Parse into a programming language

  24. Some sample XML documents

  25. Order / Whitespace • Note that element order is important, but whitespace is not. This is the same as far as the xml parser is concerned: • <Article > • <Headline>Direct Marketer Offended by Term 'Junk Mail' </Headline> • <authors> • <author> Joe Garden</author> • <author> Tim Harrod</author> • </authors> • <abstract>Dan Spengler, CEO of the direct-mail-marketing firm Mailbox of • Savings, took umbrage Monday at the use of the term <it>junk mail</it> • </abstract> • <body type="url" > http://www.theonion.com/archive/3-11-01.html </body> • </Article>

  26. Molecule Example <?xml version "1.0" ?> <CML> <MOL TITLE="Water" > <ATOMS> <ARRAY BUILTIN="ELSYM" > H O H</ARRAY> </ATOMS> <BONDS> <ARRAY BUILTIN="ATID1" >1 2</ARRAY> <ARRAY BUILTIN="ATID2" >2 3</ARRAY> <ARRAY BUILTIN="ORDER" >1 1</ARRAY> </BONDS> </MOL> </CML>

  27. Rooms example <?xml version="1.0" ?> <rooms> <room name="Red"> <capacity>10</capacity> <equipmentList> <equipment>Projector</equipment> </equipmentList> </room> <room name="Green"> <capacity>5</capacity> <equipmentList /> <features> <feature>No Roof</feature> </features> </room> </rooms>

  28. Suggestion • Try building each of those documents in XMLSpy, Oxygen, etc. • Note: it is not required to create a schema to do this. Just create new XML document and start building.

  29. Dissecting an XML Document

  30. Things that can appear in an XML document • ELEMENTS: simple, complex, empty, or mixed content; attributes. • The XML declaration • Processing instructions(PIs) <? …?> • Most common is <?xml-stylesheet …?> • <?xml-stylesheet type=“text/css” href=“mys.css”?> • Comments<!-- comment text -->

  31. Parts of an XML document Declaration <?xml version "1.0"?> <CML><MOL TITLE="Water" > <ATOMS> <ARRAY BUILTIN="ELSYM" > H O H</ARRAY> </ATOMS> <BONDS> <ARRAY BUILTIN="ATID1" >1 2</ARRAY> <ARRAY BUILTIN="ATID2" >2 3</ARRAY> <ARRAY BUILTIN="ORDER" >1 1</ARRAY> </BONDS> </MOL> </CML> Tags Begin Tags End Tags Attributes Attribute Values An XML element is everything from (including) the element's start tag to (including) the element's end tag.

  32. XML and Trees Root element • Tags give the structure of a document. They divide the document up into Elements, starting at the top most element, the root element. The stuff inside an element is its content – content can include other elements along with ‘character data’ CML MOL ATOMS BONDS ARRAY ARRAY ARRAY ARRAY CDATA sections 12 23 11 HOH

  33. XML and Trees Root element <?xml version "1.0"?> <CML> <MOL TITLE="Water" > <ATOMS> <ARRAY BUILTIN="ELSYM" > H O H</ARRAY> </ATOMS> <BONDS> <ARRAY BUILTIN="ATID1" >1 2</ARRAY> <ARRAY BUILTIN="ATID2" >2 3</ARRAY> <ARRAY BUILTIN="ORDER" >1 1</ARRAY> </BONDS> </MOL> </CML> CML MOL ATOMS BONDS ARRAY ARRAY ARRAY ARRAY Data sections 12 23 11 HOH

  34. XML and Trees rooms room room capacity features capacity equipmentlist equipmentlist equipment 10 5 feature projector No Roof

  35. More detail on elements

  36. Element relationships • Book is the root element. • Title, prod, and chapter are • child elements of book. • Book is the parent element • of title, prod, and chapter. • Title, prod, and chapter are • siblings (or sister elements) • because they have the • same parent. <book> <title>My First XML</title> <prod id="33-657" media="paper"></prod> <chapter>Introduction to XML <para>What is HTML</para> <para>What is XML</para> </chapter> <chapter>XML Syntax <para>Elements must have a closing tag</para> <para>Elements must be properly nested</para> </chapter> </book>

  37. Element content • Elements can have different content types. • An element is everything from (including) the element's start tag • to (including) the element's end tag. • An element can have element content, mixed content, • simple content, or empty content, and attributes. • Exercise • List the content type for each element in the previous example

  38. Exercise answer • In the previous example • book has element content, because it contains other elements. • Chapter has mixed content because it contains both text and other elements. • Para has simple content (or text content) because it contains only text. • Prod has empty content, because it carries no information

  39. Element naming • XML elements must follow these naming rules: • Names can contain letters, numbers, and other characters • Names must not start with a number or punctuation character • Names must not start with the letters xml (or XML or Xml ..) • Names cannot contain spaces • Take care when you "invent" element names and follow these simple rules: • Any name can be used, no words are reserved, but the idea is to make • names descriptive. Names with an underscore separator are nice. • Examples: <first_name>, <last_name>.

  40. Well formed XML

  41. Well-formed vs Valid • An XML document is said to be well-formed if it obeys basic semantic and syntactic constraints. • This is different from a valid XML document, which (as we will see in more depth) properly matches a schema.

  42. Rules for Well-Formed XML • An XML document is considered well-formed if it obeys the following rules: • There must be one element that contains all others (root element) • All tags must be balanced • <BOOK>...</BOOK> • <BOOK /> • Tags must be nested properly: • <BOOK> <LINE> This is OK </LINE> </BOOK> • <LINE> <BOOK> This is </LINE> definitely NOT </BOOK> OK • Text is case-sensitive so • <P>This is not ok, even though we do it all the time in HTML!</p>

  43. More Rules for Well-Formed XML • The attributes in a tag must be in quotes • < ITEM CATEGORY=“Home and Garden” Name=“hoe-matic t500”> • Comments are allowed • <!–- They are done just as in HTML… --> • Must begin with • <?xml version=‘1.0’ ?> • Special characters must be escaped: the most common are • < " ' > & • <formula> x &lt; y+2x </formula> • <cd title="&quot; mmusic"> • An XML document that obeys these rules is Well-Formed

  44. Some aspects of XML syntax • It is illegal to omit closing tags • unlike e.g. html • XML tags are case-sensitive • XML elements must be properly nested • XML elements must have a root element • XML comments: < -- This is a comment -- >

  45. XML Tools • XML can be created with any text editor • Normally we use an XML-friendly editor • e.g. XMLSpy • nXML emacs extensions • Oxygen • Etc etc. • To check and validate XML, use either these tools and/or xmllint on Unix systems.

  46. Another View • XML-as-data is one way to introduce XML • Another is as a markup language similar to html. • One typically says that html has a fixed tag set, whereas XML allows the definition of arbitrary tags • This analogy is particularly useful when the goal is to use XML for text presentation -- that is, when most of our data fields contain text • Note that mixed element/text fields are permissible in XML

  47. Article example <Article > <Headline>Direct Marketer Offended by Term 'Junk Mail' </Headline> <authors> <author> Joe Garden</author> <author> Tim Harrod</author> </authors> <abstract>Dan Spengler, CEO of the direct-mail-marketing firm Mailbox of Savings, took umbrage Monday at the use of the term <it>junk mail</it>. </abstract> <body type="url" > http://www.theonion.com/archive/3-11-01.html </body> </Article>

  48. XML Schema • There are many details to cover of schema specification. • We will do this in detail next lecture • Now, we detour to study the usefulness of this simple model

  49. How is XML Useful Part I Simple Mortgage Calculator

  50. Mortgage payment calculator • Design a simple application which does the following: • Accepts user input • Loan amount • Loan term • Interest rate • Extras (assessments + taxes) • Returns per-month table of • total payment • interest • Principal • Some other fun stuff

More Related