200 likes | 330 Views
XML – a data sharing standard. DSC340 Mike Pangburn. Data-sharing challenge. Companies need to “talk” across different locations/systems and software Most database systems (and other software applications, e.g., Excel) can read text files formatted as XML. Blind Men and Elephants.
E N D
XML – a data sharing standard DSC340 Mike Pangburn
Data-sharing challenge • Companies need to “talk” across different locations/systems and software • Most database systems (and other software applications, e.g., Excel) can read text files formatted as XML.
The ol’ standby: tab (or comma) delimited data… Who: authored it? to contact about data? What: are contents of database? When: was it collected? processed? finalized? Where: was the study done? Why: was the data collected? How: were data collected? processed? Verified? … can be pretty useless!
Metadata • Literally, “data about data” • a set of data that describes and gives information about other data ― Oxford English Dictionary
Is HTML a good way to share data? <h1> Bibliography </h1> <p> <i> Foundations of Databases </i> Abiteboul, Hull, Vianu <br> Addison Wesley, 1995 <p> <i> Data on the Web </i> Abiteoul, Buneman, Suciu <br> Morgan Kaufmann, 1999
XML: Tags provide meaning (metadata) <bibliography> <book> <title> Foundations… </title> <author> Abiteboul </author> <author> Hull </author> <author> Vianu </author> <publisher> Addison Wesley </publisher> <year> 1995 </year> </book> … </bibliography>
Another example: iTunes Library is XML <dict> <key>Track ID</key><integer>617</integer> <key>Name</key><string>Take Five</string> <key>Artist</key><string>Dave Brubeck Quartet</string> <key>Album</key><string>Time Out</string> <key>Genre</key><string>Jazz</string> <key>Kind</key><string>AAC audio file</string> <key>Size</key><integer>5892093</integer> <key>Total Time</key><integer>363578</integer> …. </dict>
Acct./Fin XML example • XBRL (eXtensible Business Reporting Language) is a language for the electronic communication of business and financial data. • For example, company net profit has its own unique tag. • Edgar Online (SEC company information) uses XBRL • The solution will allow users to request XBRL formatted data for all US equities from within Microsoft Excel, custom templates, and the web.
HTML vs. XML • HTML started with very few tags, but… • HTML now has many tags, and more keep being added over time • Messy, yet not customizable • XML has very few standard tags • You add custom tags that are particular to your data needs
More strict tagging rules than HTML The following code is legal in HTML: <p>This is a paragraph <p>This is another paragraph In XML all elements must have a closing tag like this: <p>This is a paragraph</p> <p>This is another paragraph</p> Opening and closing tags must have the same case: <message>This is correct</message> <Message>This is incorrect</message>
More strict tagging rules than HTML In HTML, improperly nested tags like the following are frowned upon, but will not cause an error: <b><i>This text is bold and italic</b></i> In XML all elements must be properly nested within each other like this <b><i>This text is bold and italic</i></b>
no city street person address John name person o555 phone Seattle data Thai Mary name 23456 address 345 Maple id Elementnode Attributenode Valuenode Visualizing XML data: a “Tree” • <data> • <person id=“o555” > • <name> Mary </name> • <address> • <street> Maple </street> • <no> 345 </no> • <city> Seattle </city> • </address> • </person> • <person> • <name> John </name> • <address> Thailand </address> • <phone> 23456 </phone> • </person> • </data>
persons XML: Database data vs. XML Data row row row Database table: phone name phone name phone name • <persons> • <row> <name>John</name> • <phone> 3634</phone></row> • <row> <name>Sue</name> • <phone> 6343</phone> • <row> <name>Dick</name> • <phone> 6363</phone></row> • </persons> “John” 3634 “Sue” 6343 “Dick” 6363
XML data can usually fit a DBMS table… <person> <name> John</name> <phone>1234</phone> </person> <person> <name>Joe</name> </person> • Consider the case of: missing attributes • An acceptable fit withtable • Even though blanks are deemed a bit undesirable in database tables no phone !
…but the fit is not always good <person> <name> Mary</name> <phone>2345</phone> <phone>3456</phone> </person> two phones ! • Problematic case: Repeated attributes • A poor fit with table, because database-design rule is that you should not have two columns with the same name: If XML data haddistinct “home” and “cell” columns, the issue goes away
Formatting (visualizing) XML data • Formatting (visualizing in a web page) XML data requires a separate file • What is that separate file? It’s called a“style sheet” • There are a couple different standards for style sheets • XML-specific standard: XSL • Flexible standard: CSS • CSS can be used with both .xml and .html files • We will look at CSS after XML