630 likes | 724 Views
XML Programming CSPP51038. Overview Simple Schema XML Sample Applications. Course specifics. Prerequisites. Since XML is about interoperability , I’m going to do my best to make the class language-agnostic. Being master of any one of the following is required Python Perl Java C C++
E N D
XML ProgrammingCSPP51038 Overview Simple Schema XML Sample Applications
Prerequisites • Since XML is about interoperability, I’m going to do my best to make the class language-agnostic. Being master of any one of the following is required • Python • Perl • Java • C • C++ • Visual Basic • I will base class lectures/examples on Java
Format • Five homeworks: 70% • In-class quizzes (8 total): 30% • No midterm or final • Class participation can help grade
Getting help • Mandatory: Register for discussion group via course website • To post to list, send mail to • cspp51038-su-06-1@cs.uchicago.edu • TA info, office hours/locations all on web-site starting Wed. • http://people.cs.uchicago.edu/~asiegel/cspp51038 • Consult website frequently for updates/announcements, homework, readings, etc.
Policies • Late homework allowed up to three days – 10% penalty charged automatically. • If you need any special considerations, please see me in advance or I won’t be able to help you • Will turn over homework in 7 days
Distributed programming modelsTypical Web-based Easy to deploy but slow, not great user experience database html browser WebServer http • Many programming models • JSP • Servlets • PHP • CGI (python, perl, C) • Cold Fusion Dynamically Generated html html plus optionally JavaScript to jazz up html
Distributed programming modelsTypical Web-based Better user experience. Heavier, less portable, requires socket programming to stream to server. html database WebServer http applet socket Dynamically Generated html html + applet
ports App1 sockets Application client App2 App3 Direct Connections App1 Application client Remote Procedures NDS App2 App3 Examples: Java’s rmi, CORBA
XML Basics, cont • Most modern languages have method of representing structured data. • Typical flow of events in application Read data (file, db, socket) Marshal objects Manipulate in program Unmarshal (file, db, socket) • Many language-specific technologies to reduce these steps: RMI, object • serialization in any language, CORBA (actually somewhat language neutral), • MPI, etc. • XML provides a very appealing alternative that hits the sweet spot for • many applications
Fortran Java C type Student character(len=*) :: name character(len=*) :: ssn integer :: age real :: gpa end type Student class Student{ public String name; public String ssn; public int age; public float gpa; } struct Student{ char* name; char* ssn; int age; float gpa; } User-defined types in programming languages • XML is a text-based, programming-language-neutral way of representing structured information. Compare:
Sample XML Schema • In XML, (a common) datatype description is called an XML schema. • DTD and Relax NG are other common alternatives • Below uses schema just for illustration purposes • <?xml version="1.0" encoding="UTF-8"?> • <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" • elementFormDefault="qualified" attributeFormDefault="unqualified"> • <xs:element name="student"> • <xs:complexType> • <xs:sequence> • <xs:element name="name" type="xs:string"/> • <xs:element name="ssn" type="xs:string"/> • <xs:element name="age" type="xs:integer"/> • <xs:element name="gpa" type="xs:decimal"/> • </xs:sequence> • </xs:complexType> • </xs:element> • </xs:schema> Ignore this For now
Alternative schema • In this example studentType is defined separately rather than anonymously • <xs:schema> • <xs:element name="student" type="studentType“/> • <xs:complexType name="studentType"> • <xs:sequence> • <xs:element name="name" type="xs:string"/> • <xs:element name="ssn" type="xs:string"/> • <xs:element name="age" type="xs:integer"/> • <xs:element name="gpa" type="xs:decimal"/> • </xs:sequence> • </xs:complexType> • </xs:schema> new type defined separately
Alternative: DTD • Can also use a DTD (Document Type Descriptor), but this is • much simpler than a schema but also much less powerful • (notice the lack of types) • <!DOCTYPE Student [ • <! – Each XML file is stored in a document whose name is the same as the root node -- > • <! ELEMENT Student (name,ssn,age,gpa)> • <! – Student has four attributes -- > • <!ELEMENT name (#PCDATA)> • <! – name is parsed character data -- > • <!ELEMENT ssn (#PCDATA)> • <!ELEMENT age (#PCDATA)> • <!ELEMENT gpa (#PCDATA)> • ]>
Another alternative: Relax NG • Gaining in popularity • Can be very simple to write but also contain many more features than DTD • Still much less common than schema
Creating instances of types In programming languages, we instantiate objects: struct Student s1, s2; s1.name = “Andrew” s1.ssn=“123-45-6789”; Student s = new Student(); s1.name = “Andrew”; s1.ssn=“123-45-6789”; . type(Student) :: s1 s1%name = ‘Andrew’ . C Java Fortran
Creating XML documents • XML is not a programming language! • In XML we make a Student “object” in an xml file (Student.xml): <Student> <name>Andrew</name> <ssn>123-45-6789</ssn> <age>36</age> <gpa>2.0</gpa> </Student> • Think of this as like a serialized object.
XML and Schema • Note that there are two parts to what we did • Defining the “structure” layout • Defining an “instance” of the structure • The first is done with an appropriate Schema or DTD. • The second is the XML part • Both can go in the same file, or an XML file can refer to an external Schema or DTD (typical) • From this point on we use only Schema
Exercise • Create an XML file that contains a list of two Car “objects” (pick five relevant fields). • Note that each XML file must have a single root node, so each car element must be under a common parent (e.g. cars).
Exercise Solution <?xml version="1.0" encoding="UTF-8"?> <cars> <car> <make>dodge</make> <model>ram</model> <color>red</color> <year>2004</year> <mileage>22000</mileage> </car> <car> <make>Ford</make> <model>Pinto</model> <color>white</color> <year>1980</year> <mileage>100000</mileage> </car> </cars>
? • Question: What can we do with such a file? • Write corresponding Schema to define its content • Write XSL transformation to display • Parse into a programming language
Order / Whitespace • Note that element order is important, but whitespace is not. This is the same as far as the xml parser is concerned: • <Article > • <Headline>Direct Marketer Offended by Term 'Junk Mail' </Headline> • <authors> • <author> Joe Garden</author> • <author> Tim Harrod</author> • </authors> • <abstract>Dan Spengler, CEO of the direct-mail-marketing firm Mailbox of • Savings, took umbrage Monday at the use of the term <it>junk mail</it> • </abstract> • <body type="url" > http://www.theonion.com/archive/3-11-01.html </body> • </Article>
Molecule Example <?xml version "1.0" ?> <CML> <MOL TITLE="Water" > <ATOMS> <ARRAY BUILTIN="ELSYM" > H O H</ARRAY> </ATOMS> <BONDS> <ARRAY BUILTIN="ATID1" >1 2</ARRAY> <ARRAY BUILTIN="ATID2" >2 3</ARRAY> <ARRAY BUILTIN="ORDER" >1 1</ARRAY> </BONDS> </MOL> </CML>
Rooms example <?xml version="1.0" ?> <rooms> <room name="Red"> <capacity>10</capacity> <equipmentList> <equipment>Projector</equipment> </equipmentList> </room> <room name="Green"> <capacity>5</capacity> <equipmentList /> <features> <feature>No Roof</feature> </features> </room> </rooms>
Suggestion • Try building each of those documents in XMLSpy, Oxygen, etc. • Note: it is not required to create a schema to do this. Just create new XML document and start building.
Things that can appear in an XML document • ELEMENTS: simple, complex, empty, or mixed content; attributes. • The XML declaration • Processing instructions(PIs) <? …?> • Most common is <?xml-stylesheet …?> • <?xml-stylesheet type=“text/css” href=“mys.css”?> • Comments<!-- comment text -->
Parts of an XML document Declaration <?xml version "1.0"?> <CML><MOL TITLE="Water" > <ATOMS> <ARRAY BUILTIN="ELSYM" > H O H</ARRAY> </ATOMS> <BONDS> <ARRAY BUILTIN="ATID1" >1 2</ARRAY> <ARRAY BUILTIN="ATID2" >2 3</ARRAY> <ARRAY BUILTIN="ORDER" >1 1</ARRAY> </BONDS> </MOL> </CML> Tags Begin Tags End Tags Attributes Attribute Values An XML element is everything from (including) the element's start tag to (including) the element's end tag.
XML and Trees Root element • Tags give the structure of a document. They divide the document up into Elements, starting at the top most element, the root element. The stuff inside an element is its content – content can include other elements along with ‘character data’ CML MOL ATOMS BONDS ARRAY ARRAY ARRAY ARRAY CDATA sections 12 23 11 HOH
XML and Trees Root element <?xml version "1.0"?> <CML> <MOL TITLE="Water" > <ATOMS> <ARRAY BUILTIN="ELSYM" > H O H</ARRAY> </ATOMS> <BONDS> <ARRAY BUILTIN="ATID1" >1 2</ARRAY> <ARRAY BUILTIN="ATID2" >2 3</ARRAY> <ARRAY BUILTIN="ORDER" >1 1</ARRAY> </BONDS> </MOL> </CML> CML MOL ATOMS BONDS ARRAY ARRAY ARRAY ARRAY Data sections 12 23 11 HOH
XML and Trees rooms room room capacity features capacity equipmentlist equipmentlist equipment 10 5 feature projector No Roof
Element relationships • Book is the root element. • Title, prod, and chapter are • child elements of book. • Book is the parent element • of title, prod, and chapter. • Title, prod, and chapter are • siblings (or sister elements) • because they have the • same parent. <book> <title>My First XML</title> <prod id="33-657" media="paper"></prod> <chapter>Introduction to XML <para>What is HTML</para> <para>What is XML</para> </chapter> <chapter>XML Syntax <para>Elements must have a closing tag</para> <para>Elements must be properly nested</para> </chapter> </book>
Element content • Elements can have different content types. • An element is everything from (including) the element's start tag • to (including) the element's end tag. • An element can have element content, mixed content, • simple content, or empty content, and attributes. • Exercise • List the content type for each element in the previous example
Exercise answer • In the previous example • book has element content, because it contains other elements. • Chapter has mixed content because it contains both text and other elements. • Para has simple content (or text content) because it contains only text. • Prod has empty content, because it carries no information
Element naming • XML elements must follow these naming rules: • Names can contain letters, numbers, and other characters • Names must not start with a number or punctuation character • Names must not start with the letters xml (or XML or Xml ..) • Names cannot contain spaces • Take care when you "invent" element names and follow these simple rules: • Any name can be used, no words are reserved, but the idea is to make • names descriptive. Names with an underscore separator are nice. • Examples: <first_name>, <last_name>.
Well-formed vs Valid • An XML document is said to be well-formed if it obeys basic semantic and syntactic constraints. • This is different from a valid XML document, which (as we will see in more depth) properly matches a schema.
Rules for Well-Formed XML • An XML document is considered well-formed if it obeys the following rules: • There must be one element that contains all others (root element) • All tags must be balanced • <BOOK>...</BOOK> • <BOOK /> • Tags must be nested properly: • <BOOK> <LINE> This is OK </LINE> </BOOK> • <LINE> <BOOK> This is </LINE> definitely NOT </BOOK> OK • Text is case-sensitive so • <P>This is not ok, even though we do it all the time in HTML!</p>
More Rules for Well-Formed XML • The attributes in a tag must be in quotes • < ITEM CATEGORY=“Home and Garden” Name=“hoe-matic t500”> • Comments are allowed • <!–- They are done just as in HTML… --> • Must begin with • <?xml version=‘1.0’ ?> • Special characters must be escaped: the most common are • < " ' > & • <formula> x < y+2x </formula> • <cd title="" mmusic"> • An XML document that obeys these rules is Well-Formed
Some aspects of XML syntax • It is illegal to omit closing tags • unlike e.g. html • XML tags are case-sensitive • XML elements must be properly nested • XML elements must have a root element • XML comments: < -- This is a comment -- >
XML Tools • XML can be created with any text editor • Normally we use an XML-friendly editor • e.g. XMLSpy • nXML emacs extensions • Oxygen • Etc etc. • To check and validate XML, use either these tools and/or xmllint on Unix systems.
Another View • XML-as-data is one way to introduce XML • Another is as a markup language similar to html. • One typically says that html has a fixed tag set, whereas XML allows the definition of arbitrary tags • This analogy is particularly useful when the goal is to use XML for text presentation -- that is, when most of our data fields contain text • Note that mixed element/text fields are permissible in XML
Article example <Article > <Headline>Direct Marketer Offended by Term 'Junk Mail' </Headline> <authors> <author> Joe Garden</author> <author> Tim Harrod</author> </authors> <abstract>Dan Spengler, CEO of the direct-mail-marketing firm Mailbox of Savings, took umbrage Monday at the use of the term <it>junk mail</it>. </abstract> <body type="url" > http://www.theonion.com/archive/3-11-01.html </body> </Article>
XML Schema • There are many details to cover of schema specification. • We will do this in detail next lecture • Now, we detour to study the usefulness of this simple model
How is XML Useful Part I Simple Mortgage Calculator
Mortgage payment calculator • Design a simple application which does the following: • Accepts user input • Loan amount • Loan term • Interest rate • Extras (assessments + taxes) • Returns per-month table of • total payment • interest • Principal • Some other fun stuff