400 likes | 525 Views
XML:Managing data exchange. Words can have no single fixed meaning. Like wayward electrons, they can spin away from their initial orbit and enter a wider magnetic field. No one owns them or has a proprietary right to dictate how they will be used . David Lehman, End of the Word , 1991.
E N D
XML:Managing data exchange Words can have no single fixed meaning. Like wayward electrons, they can spin away from their initial orbit and enter a wider magnetic field. No one owns them or has a proprietary right to dictate how they will be used. David Lehman, End of the Word, 1991.
Central problems of data management • Capture • Storage • Retrieval • Exchange
EDI • Electronic exchange of standard documents • In use for some 20 years • Standards • ANSI X.12 (US and Canada) • EDIFACT (International)
EDI: Advantages • Paper handling is reduced, saving time and money • Data can be exchanged in real time • There are fewer errors since data are keyed only once • Enhanced data sharing enables greater coordination of activities between business partners • Money flows are accelerated and payments received sooner
EDI: Adoption • Much information flow is still on paper • Electronic exchange is the exception rather than the rule • The Internet is a lower cost solution than EDI using value added networks (VANs)
SGML • Document management consumes • 15% of company revenue • 25% of labor costs • 10 - 60% of an office worker’s time • Standard generalized markup language (SGML) was designed to reduce the cost of document management
Markup language • Embedded information within text about the meaning of the text <cdliner>This uniquely creative collaboration between Miles Davis and Gil Evans has already resulted in two extraordinary albums—<cdtitle>Miles Ahead</cdtitle><cdid>CL 1041></cdid> and <cdtitle>Porgy and Bess</cdtitle><cdid>CL 1274</cdid>.</cdliner>
SGML • A vendor independent standard for publication of all media • Cross system • Portable • Defines the structure of a document • The parent of HTML and XML
SGML: Advantages • Re-use • Same advantage as with word processing • Flexibility • Generate output for multiple media • Revision • Version control
SGML code <chapter> <no>16</no> <title>XML: Managing Data Exchange</title> <section> <quote><emph type = "2">Words can have no single fixed meaning. Like wayward electrons, they can spin away from their initial orbit and enter a wider magnetic field. No one owns them or has a proprietary right to dictate how they will be used.</emph></quote> … </section> … </chapter>
HTML code <html> <body> <h1><b>16</b></h1> <h1><b>XML: Managing Data Exchange</b></h1> <p> <i>Words can have no single fixed meaning. Like wayward electrons, they can spin away from their initial orbit and enter a wider magnetic field. No one owns them or has a proprietary right to dictate how they will be used.</i> </p> </body> </html>
The problem with HTML • Presentation not meaning • Reader has to infer meaning • Machines are not very good at inferring meaning
XML • Extensible markup language • SGML for commerce • A meta-language • A language to generate languages • Will steadily replace HTML
Structured text User-definable structure Context-sensitive retrieval Greater hypertext linkage Formatted text Pre-defined format Limited retrieval Limited hypertext linking XML vs. HTML
XML rules • Elements must have both an opening and closing tag • Elements must follow a strict hierarchy with only one root element • Elements may not overlap other elements • Element names must obey XML naming conventions • XML is case sensitive
Processing shift • From server to browser • Browser can ‘read’ meaning of the data • Less data transmitted
Searching • Search engines look for appropriate tags in the XML code • Faster • More precise
Expected gains • Store once and format many times • Hardware and software independence • Capture once and exchange many times • Accelerated targeted searching • Less network congestion
XML language design • Designers must define • Allowable tags • Rules for nesting tags • Which tagged elements can be processed
XML Schema • The schema defines • The names and contents of all elements that are permissible in a certain document • The structure of the document • How often an element might appear • The order in which the elements must appear • The type of data the element contains
DOM • Document object model • The data model for an XML document • A tree (1:m)
Schema (cdlib.xsd) • XML declaration and root of all schema documents <?xml version="1.0" encoding="UTF-8"?> <xsd:schemaxmlns:xsd='http://www.w3.org/2001/XMLSchema'>
Schema (cdlib.xsd) • CD library definition <xsd:elementname="cdlibrary"> <xsd:complexType> <xsd:sequence> <xsd:elementname="cd"type="cdType” minOccurs="1”maxOccurs="unbounded"/> </xsd:sequence> </xsd:complexType> </xsd:element>
Schema (cdlib.xsd) • CD definition <xsd:complexType name="cdType"> <xsd:sequence> <xsd:element name="cdid"type="xsd:string"/> <xsd:element name="cdlabel"type="xsd:string"/> <xsd:element name="cdtitle"type="xsd:string"/> <xsd:element name="cdyear"type="xsd:integer"/> <xsd:element name="track"type="trackType"minOccurs="1" maxOccurs="unbounded"/> </xsd:sequence> </xsd:complexType>
Schema (cdlib.xsd) • Track definition <xsd:complexTypename="trackType"> <xsd:sequence> <xsd:elementname="trknum"type="xsd:integer"/> <xsd:elementname="trktitle"type="xsd:string"/> <xsd:elementname="trklen"type="xsd:time"/> </xsd:sequence> </xsd:complexType>
Common datatypes • string • boolean • uriReference • decimal • float • integer • time • date
XML (cd.xml) <?xml version = "1.0” encoding=“UTF-8”?> <cdlibrary xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="cdlib.xsd"> <cd> <cdid>A2 1325</cdid> <cdlabel>Atlantic</cdlabel> <cdtitle>Pyramid</cdtitle> <cdyear>1960</cdyear> <track> <trknum>1</trknum> <trktitle>Vendome</trktitle> <trklen>2:30</trklen> </track> … </cd> </cdlibrary>
XSL • Extensible stylesheet language • Defines how an XML document is rendered • An XML file
XSL • Results of applying cd.xsl Pyramid, Atlantic, 1960 [A2 1325] 1 Vendome 00:02:30 2 Pyramid 00:10:46 Ella Fitzgerald, Verve, 2000 [D136705] 1 A tisket, a tasket 00:02:37 2 Vote for Mr. Rhythm 00:02:25 3 Betcha nickel 00:02:52
cd.xsl <?xml version="1.0" encoding="UTF-8”?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output encoding="UTF-8" indent="yes" method="html" version="1.0" /> <xsl:template match="/"> <html> <head> <title>Complete List of Songs</title> </head> <body> <h2>Complete List of Songs</h2> <xsl:apply-templates select= "cdlibrary" /> <xsl:for-each select="cd"> <font color="maroon"> <xsl:value-of select="cdtitle" /> , <xsl:value-of select="cdlabel" /> , <xsl:value-of select="cdyear" /> [ <xsl:value-of select="cdid" />] </font> <br />
cd.xsl (continued) <table> <xsl:for-each select= "track"> <tr> <td align="left"> <xsl:value-of select= "trknum" /> </td> <td> <xsl:value-of select= "trktitle" /> </td> <td align= "center" > <xsl:value-of select= "trklen" /> </td> </tr> </xsl:for-each> </table> <br /> </xsl:for-each> </body> </html> </xsl:template> </xsl:stylesheet>
Converting XML • Transformation and manipulation • XSLT • One XML vocabulary to another • FPML to finML • Re-ordering, filtering, and sorting • Rendering • XSLT • e.g., XML to WAP
XML and databases • XML is a data management tool • XML documents will have to be stored for the long-term • Need a DBMS
DBMS requirements • Store a large number of documents; • Store large documents • Support access to portions of a document (e.g., the data for a single CD in a library of 20,000 CDs) • Concurrent access • Version control • Integrate data from other sources
RDBMS • Document-centric • Store as CLOB • Data-centric • Object-relational extensions to support element retrieval and update • Expect RDBMS vendors to offer extensions to support XML
Database to XML • A significant proportion of Web pages are generated from databases • Instead of converting to HTML these should be converted to XML • Render with XSL • Need tools for converting relational data to XML
OODBMS • A good fit to the DOM • Little development at this stage • Vendors have fewer resources
XML database • Special purpose XML database • Tamino • This is a new area and you will need to monitor developments
Conclusion • XML is a significant technological development • Its main purpose is to support data exchange • It will lower the cost of business transactions • It will be a critical data management technology