260 likes | 413 Views
XML: It’s a Good Thing. Richard N. Taylor & Eric M. Dashofy ICS 123 S2002. Motivation. “I'll never go hungry again!” – Scarlett O’Hara “I’ll never write a parser again!” – Anonymous XML User Data encoding is a perpetual problem in computer applications
E N D
XML: It’s a Good Thing Richard N. Taylor & Eric M. Dashofy ICS 123 S2002
Motivation • “I'll never go hungry again!” –Scarlett O’Hara • “I’ll never write a parser again!” – Anonymous XML User • Data encoding is a perpetual problem in computer applications • Lots of time is wasted writing parsers, lexers, marshalers, unmarshalers, data bindings, even meta-languages!
Existing Problems File Exchange App1 File Format 1 3rd Party Converter Import Converter App3 App2 Export Converter File Format 3 File Format 2
Why is this a problem? • Everybody has a proprietary format • Converters must be maintained by various parties • This is an n2 problem! • Something is usually lost in the translation • Note: Same problems with data exchange across networked apps
In-memory Representation Another Problem Defining a File or Data Format Helps to generate Data Bindings edits Parser Serializer Helps to generate Disk Meta- Language Net
Why is this a problem? • Parsers, serializers, data bindings all have to be developed • This development takes time • Conflicting tools for assistance • How do you evolve the file format?
Potential Solution • To too many file formats: • Intermediate format • Even better: Common format • An agreed-upon meta-language • Ability to extend language and ignore unknown constructs • To tool-building: • Choose a suitable meta-language • Build tools surrounding that meta-language • Port those tools to different environments, but keep the APIs semi-standard
What is XML • Stolen from xml-computing.com: • eXtensible Markup Language • A way to represent structured data • a World Wide Web Consortium (W3C) standard • platform-independent • a way to create your own custom languages • license-free and well-supported • the future of computing? • Buzzword-compliant!
Origins of XML • From SGML • Standard Generalized Markup Language • cf. HTML • A document markup language • For annotating documents with metadata to make them easier to interpret Hi! My name is <NAME><FIRST>Eric</FIRST> <LAST>Dashofy</LAST></NAME>. You can email me at <EMAIL>edashofy@ics.uci.edu</EMAIL>.
The Times, They are a Changin’ • XML is arguably more useful to simply encode data, outside the strict context of a document <PERSON> <NAME> <FIRST>Eric</FIRST> <LAST>Dashofy</LAST> <DEPARTMENT>Information and Computer Science</DEPARTMENT> <EMAIL>edashofy@ics.uci.edu</EMAIL> </NAME> </PERSON>
Terminology • Tag • The markup of the document, enclosed in angle-brackets. • <foo> is the start tag • </foo> is the end tag • Tags may be nested, but may not cross • <A>foo<B>bar</B>baz</A> --OK! • <A>foo<B>bar</A>baz</B> --NO! • Hierarchical data structure
Terminology • Element • Stuff in between a start and end tag • Includes the tags • May contain nested elements • Ex: • <a>foo</a> • <a>foo<b>bar</b></a> • (nested)
Terminology • Attribute • A way of annotating tags with additional info • Simple name-value pairs • Ex: • <name lang=“English”>Henry</name> • <name lang=“Spanish”>Enrique</name>
Document • A collection of elements, usually in a file • One top-level element • Called the “root” element or “document” element • Some header stuff <?xml version="1.0"?> <person> <name> <first>Eric</first> <last>Dashofy</last> </name> <department>Information and Computer Science</department> <email>edashofy@ics.uci.edu</email></person>
Side-note: • “If you don’t understand it, ignore it.”
Kinds of Documents • “Well Formed” • Syntactically correct • All the start tags have end tags • All the start-quotes have end-quotes • etc. • “Valid” • Well-formed, and conforms to some language specification
Why a meta-language? • To define what elements, sub-elements, attributes are allowed • And in what order • So different organizations can agree on a real data format • Well-formed documents don’t restrict how you encode the data, so they’re not very valuable
DTDs • Document Type Definition • Part of XML 1.0 • The original XML meta-language • Doesn’t look like XML • Like production rules <!DOCTYPE FooDocument [ <!ELEMENT Foo (Bar*,Baz?,Booyah+)> <!ELEMENT Bar (#PCDATA)> <!ELEMENT Baz (#PCDATA)> <!ELEMENT Booyah (#PCDATA)> ]>
Namespaces • “You keep on using that word, I do not think it means what you think it means.” –Inigo Montoya • How can you make a document that draws elements from multiple DTDs? <usa:address xmlns:usa=“http://www.dtds.com/usaddress.dtd”> <usa:street>1600 Pennsylvania Ave</usa:street> <usa:city>Washington</usa:city> <usa:state>DC</usa:state> <usa:zip>20509</usa:zip></usa:address> <uk:address xmlns:uk=“http://www.dtds.com/ukaddress.dtd”> <uk:street>23B Baker Street</uk:street> <uk:city>London, England</uk:street> <uk:postcode>N22</uk:postcode></uk:address>
Why not DTDs? • “Uhm, DTDs are bad, mmkay?” –Mr. Mackey • DTDs are lacking in some areas • Don’t look like XML • Can’t specify at a level below elements • i.e. can’t specify regular expressions on content • Difficult to extend/add things to existing element definitions • Difficult to implement modular languages
XML Schemas • A DTD replacement from W3C • Look like XML / Easier to read • Contribute a type system to XML • Element, attribute definitions become types • Single-inheritance model in the type system • Better namespace management
Example <complexType name="Address"> <sequence> <element name="name" type="string"/> <element name="street" type="string"/> <element name="city" type="string"/> </sequence></complexType> <complexType name="USAddress"> <complexContent> <extension base="Address"> <sequence> <element name="state" type="USState"/> <element name="zip" type="positiveInteger"/> </sequence> </extension> </complexContent> </complexType>
Example, cont. <complexType name="UKAddress"> <complexContent> <extension base="Address"> <sequence> <element name="postcode" type="UKPostcode"/> </sequence> <attribute name="exportCode" type="positiveInteger" fixed="1"/> </extension> </complexContent> </complexType>
What do you get? • Lots of tools for free • Parsers • DOM and SAX • Serializers • Transformation • XSL(T) • A meta-language (two, actually ) • Data Bindings • Syntax-directed editors
In-memory Representation Spotlight: DOM & SAX • APIs for accessing XML documents • SAX: Lightweight, callback based • “I saw an element! Ooh, I saw an attribute!” • DOM: Parses entire document into an object tree in memory DOM Parser XML Document
Spotlight: Data Bindings • DOM API is very, very generic • Example functions: • appendChild(Element n) • setAttribute(String name, String value) • No namespace management • Data bindings are APIs guided by the language definition • Example functions: • addComponent(Component c); • setIdentifier(String id); • Data bindings can be generated automatically