1.01k likes | 1.1k Views
XML Basics. Overview. What is XML?. E x tensible M arkup L anguage A syntax for documents A Meta -Markup Language A Structural and Semantic language, not a formatting language Not just for Web pages. XML is a Meta Markup Language. Not like HTML, troff, LaTeX
E N D
XML Basics Overview IT380
What is XML? • Extensible Markup Language • A syntax for documents • A Meta-Markup Language • A Structural and Semantic language, not a formatting language • Not just for Web pages IT-380
XML is a Meta Markup Language • Not like HTML, troff, LaTeX • Make up the tags you needs as you need them • The tags you create can be documented in a Document Type Definition (DTD) • A meta syntax for domain-specific markup languages like MusicML, MathML, and CML IT-380
XML describes structure and semantics, not formatting • XML documents form a tree • Element and attribute names reflect the kind of the element • Formatting can be added with a style sheet IT-380
A Song Description in HTML <dt>Hot Cop <dd> by Jacques Morali, Henri Belolo, and Victor Willis <ul> <li>Producer: Jacques Morali <li>Publisher: PolyGram Records <li>Length: 6:20 <li>Written: 1978 <li>Artist: Village People </ul> IT-380
A Song Description in XML <SONG> <TITLE>Hot Cop</TITLE> <COMPOSER>Jacques Morali</COMPOSER> <COMPOSER>Henri Belolo</COMPOSER> <COMPOSER>Victor Willis</COMPOSER> <PRODUCER>Jacques Morali</PRODUCER> <PUBLISHER>PolyGram Records</PUBLISHER> <LENGTH>6:20</LENGTH> <YEAR>1978</YEAR> <ARTIST>Village People</ARTIST> </SONG> IT-380
Style Sheets provide formatting SONG {display: block} TITLE {display: block; font-family: Helvetica, serif; font-size: 20pt; font-weight: bold} COMPOSER {display: block; font-family: Times, Times New Roman, serif; font-size: 14pt; font-style: italic} ARTIST {display: block; font-family: Times, Times New Roman, serif; font-size: 14pt; font-weight: bold; font-style: italic} PUBLISHER {display: block; font-size: 14pt; font-family: Times, Times New Roman, serif} LENGTH {display: block; font-family: Times, Times New Roman, serif; font-size: 14pt} YEAR {display: block; font-family: Times, Times New Roman, serif; font-size: 14pt} IT-380
Attaching style sheets to documents • Processing Instruction • <?xml-stylesheet type="text/css" href="song.css"?> • Converter Program IT-380
What is XML used for? • Domain-Specific Markup Languages • Self-Describing Data • Interchange of Data Among Applications • Structured and Integrated Data IT-380
Domain-Specific Markup Languages • Non proprietary format • Don’t pay for what you don’t use IT-380
Self-Describing Data • Much data is lost due to format problems • XML is very simple • XML is self-describing • XML is well documented IT-380
<PERSON ID="p1100" SEX="M"> <NAME> <GIVEN>Judson</GIVEN> <SURNAME>McDaniel</SURNAME> </NAME> <BIRTH> <DATE>21 Feb 1834</DATE> </BIRTH> <DEATH> <DATE>9 Dec 1905</DATE> </DEATH> </PERSON> IT-380
Interchange of Data Among Applications • E-commerce • Syndication IT-380
Structured and Integrated Data • Can specify relationships between elements • Can assemble data from multiple sources IT-380
XML Applications • A specific markup language uses the XML meta-syntax is called an XML application • Different XML applications have their own more constricted syntaxes and vocabularies within the broader XML syntax • Further syntax can be layered on top of this; e.g. data typing through DCDs or other schemas IT-380
Example XML Applications • Web Pages • Mathematical Equations • Music Notation • Vector Graphics • Metadata • and more… IT-380
Mathematical Markup Language IT-380
Channel Definition Format • <?xml version="1.0"?> • <CHANNEL HREF="http://metalab.unc.edu/xml/index.html"> • <TITLE>Cafe con Leche</TITLE> • <ITEM HREF="http://metalab.unc.edu/xml/books.html"> • <TITLE>Books about XML</TITLE> • </ITEM> <ITEM HREF="http://metalab.unc.edu/xml/tradeshows.html"> • <TITLE>Trade shows and conferences about XML</TITLE> • </ITEM> <ITEM HREF="http://metalab.unc.edu/xml/lists.htm"> • <TITLE>Mailing Lists dedicated to XML</TITLE> • </ITEM></CHANNEL> IT-380
Classic Literature • The Complete Plays of Shakespeare • The Bible • The Koran • The Book of Mormon IT-380
Vector Graphics • Vector Markup Language (VML) • Internet Explorer 5.0 • Microsoft Office 2000 • Scalable Vector Graphics (SVG) IT-380
The Resource Description Framework (RDF) • Meta-data • Dublin Core • Better Web searching IT-380
An Example of RDF <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/DC/> <rdf:Description about="http://metalab.unc.edu/xml/> <dc:CREATOR>Elliotte Rusty Harold</dc:CREATOR> <dc:TITLE>Cafe con Leche</dc:TITLE> </rdf:Description> </rdf:RDF> IT-380
XML for XML • XSL: The Extensible Stylesheet Language • DCD: The Document Content Description Schema Language • XLL: The Extensible Linking Language IT-380
XSL: The Extensible Stylesheet Language • XSL Transformations • XSL Formatting Objects IT-380
DCD: The Document Content Description Schema Language • Data Typing in XML is Weak • <MONTH>9</MONTH> • <DCD> • <ElementDef Type="MONTH" • Model="Data" Datatype="i1" • Min="1" Max="12" /> • </DCD> IT-380
XLL: The Extensible Linking Language • Any element can be a link • Links can be bi-directional • Links can be separated from the documents they connect <footnote xlink:form="simple" href="footnote7.xml">7</footnote> IT-380
File Formats, In-house applications, and other behind the scenes uses • Microsoft Office 2000 • Federal Express Web API • Netscape What’s Related IT-380
Hello XML • Plain ASCII or UTF-8 text • .xml is standard file extension • Any standard text editor will work • <?xml version="1.0" standalone="yes"?> • <FOO> • Hello XML! • </FOO> IT-380
The XML Declaration <?xml version="1.0" standalone="yes"?> • version attribute • required • always has the value 1.0 • standalone attribute • yes • no • encoding attribute • UTF-8 • 8859_1 • etc. IT-380
The FOO element • Start tag <FOO> • Contents "Hello XML!" • End tag </FOO> • <FOO> • Hello XML! • </FOO> IT-380
greeting.xml • <?xml version="1.0" standalone="yes"?> • <GREETING> • Hello XML! • </GREETING> IT-380
Style sheets • Separate from the XML document • Different Languages • Cascading Style Sheets Level 1 (CSS1) • Internet Explorer 5.0 • Mozilla 5.0 • Cascading Style Sheets Level 2 (CSS2) • Internet Explorer 5 (partial) • Mozilla 5.0 (partial) • Extensible Style Language (XSL) • Internet Explorer 5.0 (older draft, buggy) • LotusXSL, XT, Other non-browser converters • Document Style and Semantics Language (DSSSL) • Jade IT-380
xml-stylesheet • Style sheets are attached via an xml-stylesheet processing instruction in the prolog <?xml version="1.0" standalone="yes"?> <?xml-stylesheet type="text/css" href="greeting.css"?> <GREETING>Hello XML!</GREETING> • type attribute has the value text/css or text/xsl • href attribute is a URL to the stylesheet, possibly relative • Can also use non-browser converters like XT, LotusXSL, and Jade IT-380
greeting.css GREETING {display: block; font-size: 24pt; font-weight: bold} IT-380
A larger example: Baseball statistics • Examine the data • Design a vocabulary for the data • Write a style sheet IT-380
Sample statistics http://cbs.sportsline.com/u/baseball/mlb/stats.htm IT-380
Organizing the Data • XML documents are trees. • XML elements contain other elements as well as text • Within these limits there's more than one way to organize the data • Hierarchically • Relationally • Objects IT-380
What is the Root Element • The League? • The Season? • A custom Document element? IT-380
The Root Element • Choose SEASON for the root element • Everything else will be a descendant of SEASON • This is not the only possible choice • <?xml version="1.0"?> • <SEASON> • </SEASON> IT-380
What are the Immediate Children of The root? • Leagues? • Teams? • Players? • Games? IT-380
Child Elements • <?xml version="1.0"?><SEASON> <YEAR> 1998 </YEAR></SEASON> IT-380
White space in XML is not especially significant • <?xml version="1.0"?> • <SEASON><YEAR>1998</YEAR></SEASON> IT-380
Leagues • Major league baseball is divided into two leagues • Each league has • a name • three divisions IT-380
Divisions • Each division has • name • 4-6 teams IT-380
Teams • Each team has • Name • City • Players IT-380
Player Data • Each player has • First name • Last name • Position • Statistics IT-380
G Games Played GS Games Started AB At Bats R Runs H Hits 2B Doubles 3B Triples HR Home Runs RBI Runs Batted In SB Stolen Bases CS Caught Stealing SH Sacrifice Hits SF Sacrifice Flies Err Errors PB Pitcher Balked BB Base on Balls (Walks) SO Strike Outs HBP Hit By Pitch Player Batting Statistics IT-380
What does a player look like • Long names vs. short names IT-380
The Complete 1998 Major League • Long version IT-380
A Style Sheet • 1998shortstats.xml • baseballstats.css • <?xml-stylesheet type="text/css" href="baseballstats.css"?> • styled1998shortstats.xml IT-380