300 likes | 527 Views
An Overview of XML. What it Is, How it Works, and How it’s Used for Library Metadata Steven Bernstein CLA Technical Services Section Fall Workshop November 15, 2012. A Few Things Before I Start.
E N D
An Overview of XML What it Is, How it Works, and How it’s Used for Library Metadata Steven Bernstein CLA Technical Services Section Fall Workshop November 15, 2012
A Few Things Before I Start • Nothing about which I am about to speak is anything new. The technologies and standards covered in this presentation have been around for about a decade. • That said, most technical services librarians (let alone public services librarians) don’t have an intimate knowledge of how XML works. • This presentation assumes you fit into the category of “most technical services librarians”.
A Few Things Before I Start • I am a punster. You are forewarned.
What is XML? EXtensibleMarkup Language
EXtensibleMarkup Language • Markup Languages • What are Markup Languages?Markup Languages provide context to electronic data, thereby transforming them into information that can more readily be used by both computers and humans • Examples of Markup Languages- HTML - MARC
Sample Basic HTML Webpage <html> <head> <title>My Website</title> </head> <body> <h1 id=“banner”>My Website</h1> <ul id=“menu”> <li><a href=“index.html”>Home</li> <li><a href=“about.html”>About Me</li> <li><a href=“news.html”>News</li> </ul> <p>Welcome to my website! I’m so glad you came for a visit. I don’t have any content yet so please come back again soon.</p> </body> </html>
Sample Basic MARC Record 100 1_ $a DeWind, Dustin. 245 10 $a Mortality and the inevitability of dying / $c by Dustin DeWind. 260 $a Death Valley, Nev. : $b Heavenly Press, $c 2012. 300 $a 120 p. : $b ill. ; $c 24 cm. 650 0 $a Death.
EXtensibleMarkup Language • Extensible • What makes XML Extensible?Tags are not standardized. Anyone can develop their own XML schema with tags that they define themselves. • Examples of XML Schema- Really Simple Syndication (RSS) - Recipe Markup Language - MARCXML
Sample Really Simple Syndication (RSS) Feed <?xml version="1.0" encoding="utf-8"?> <rss version="2.0"> <channel> <title>The Onion</title> <description>American’s Finest News Source</description> <link>http://www.theonion.com</link> <item> <title>Netflix Switches Over To Convenient New Physical Locations</title> <link>http://www.theonion.com/articles/netflix-switches-over-to- convenient-new-physical-l,19271/</link> <pubDate>Mon, 25 Feb 2011 00:00:00 GMT</pubDate> <description>Officials at Netflix announced Thursday that the company has finally reached its long-term goal of constructing a chain of easily accessible stores. "Having actual physical locations was always our ultimate intent, and we are proud to provide our customers with the convenient option of driving to a nearby Netflix store and renting any available movie for just $3.99 per title," said Netflix spokesman Henry Regis...</description> </item> ... </channel> ... </rss>
Sample RecipleML Record <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE recipeml PUBLIC "-//FormatData//DTD RecipeML 0.5//EN" "http://www.formatdata.com/recipeml/recipeml.dtd"> <?xml-stylesheethref="dessert1.css" type="text/css"?> <recipeml version="0.5"> <recipe> <head> <title>Ice</title> </head> <ingredients> <ing> <amt> <qty>2</qty> <unit>ounces</unit> </amt> <item>water</item> </ing> </ingredients> <directions> <step>Freeze the water.</step> </directions> </recipe> </recipeml>
How are XML Schemas Created? Document Type Definitions (DTDs), XML Stylesheets, and Namespace
Schemas: Validated XML • Immediately after an initial tag that serves to declare that what is contained in the file is XML, most XML files start with a tag that links* to an external file that defines: • All tags that may be used in the file • How the tags are nested • The permissible values and format thereof of each tag • The attributes of each tag; and • The permissible values and format thereof of each attribute * Though, it is also possible to include the definitions in the file itself.
Schemas: Validated XML • There are two methods of defining valid tags in an XML file: • Document Type Definitions (DTD); and • XML Schema Definitions (XSD)
Namespace: Would the Real Tag Please Stand Up? • Oftentimes, a schema can have child tags and/or attributes with the same names as the child tags of another parent tag. For example: <member> <id>12345</id> </member> <item> <id>54321</id> </item>
Namespace: Would the Real Tag Please Stand Up? • So as to avoid confusion, xmlns attributes are included in the root element of the XML file, suffixed with a unique name for each group of tags that appear in the file (i.e the namespace name). The attribute’s value is a “link” to the body responsible for the schema followed again by the namespace’s name. <library xmlns:patronrecord="http://www.mylibrary.org/patronrecord" xmlns:itemrecord="http://www.mylibrary.org/itemrecord"> “link” name name as suffix
Namespace: Would the Real Tag Please Stand Up? • Tags are defined as belonging to a particular namespace by appending the namespace name to them as a prefix. <patronrecord:member> <patronrecord:id>12345</patronrecord:id> </patronrecord:member> <itemrecord:item> <itemrecord:id>54321</itemrecord:id> </itemrecord:item> • Note: XML schemas defined using XSDs support namespace; XML schemas defined using DTDs do not
The Magic of XML: XSLT EXtensibleStylesheetLanguage Transformations
Switching Schemas • EXtensibleStylesheetLanguage Transformations (XSLT) allow one to input an XML file that uses one schema and output an XML file in another schema or format. • XSLT includes functions to manipulate the values as part of the transformation • Namespace comes in very handy when transforming an XML file from one schema to another.
XML for Library Metadata Available Schemas and Sharing Metadata
Library XML Schemas • General Metadata • MARCXMLAn XML schema containing all of the tags, indicators, and subfields of MARC21 • Authority Metadata • MADS (Metadata Authority Description Schema)A subset of MARC21 authority tags which uses linguistic tags rather than numerical tags
Library XML Schemas • Bibliographic Metadata • MODS (Metadata Object Description Schema )A subset of MARC21 bibliographic tags which uses linguistic tags rather than numerical tags • METS (Metadata Encoding & Transmission Standard)An XML schema for descriptive, administrative, and structural metadata of digital objects • YANKEES (Yokel’s And Non-Knowledgable’s Extensible Encoding Standard)METS is used in Queens; YANKEES is used in the Bronx.
Library XML Schemas, etc. • Archival Metadata • EAD (Encoded Archival Description) • Graphical Metadata • VRA Core (Visual Resource Association Core) • MIX (Metadata for Images in XML) • Etc. etc. etc.
Sample MARCXML Record <?xml version="1.0" encoding="UTF-8"?> <record xmlns:zs="http://www.loc.gov/zing/srw/" xmlns:cinclude="http://apache.org/cocoon/include/1.0" xmlns="http://www.loc.gov/MARC21/slim"> <leader>01298cam a22003255a 4500</leader> <controlfield tag="001">14730252</controlfield> <controlfield tag="005">20090313105340.0</controlfield> <controlfield tag="008">070209r20121957nyua b 000 1 eng </controlfield> <datafield tag="100" ind2=" " ind1="1"> <subfield code="a">Inay, Matthew</subfield> </datafield> <datafield tag="245" ind2="0" ind1="1"> <subfield code="a">Afternoon performances /</subfield> <subfield code="c">by Matt Inay</subfield> </datafield> </record>
Sample MODS Record <?xml version="1.0" encoding="UTF-8"?> <mods version="3.4" xsi:schemaLocation="http://www.loc.gov/mods/v3 http://www.loc.gov/standards/mods/v3/mods-3-4.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.loc.gov/mods/v3"> <titleInfo> <title>Chicken soup for the vegetarian soul</title> </titleInfo> <name usage="primary" type="personal"> <namePart>Soyhen, Aida</namePart> </name> <typeOfResource>text</typeOfResource> <originInfo> <place> <placeTerm type="code" authority="marccountry">nyu</placeTerm> </place> <place> <placeTerm type="text">New York</placeTerm> </place> ... </mods>
Sample MADS Record <?xml version="1.0" encoding="utf-8"?> <madsCollectionxsi:schemaLocation="http://www.loc.gov/mads http://www.loc.gov/standards/mads/mads.xsd" xmlns="http://www.loc.gov/mods/v3" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink"> <mads version="beta"> <authority> <name type="personal" authority="naf"> <namePart>O'Shea, Rick</namePart> </name> <titleInfo authority="naf"> <title /> </titleInfo> </authority> <note type="source">Bouncing back, 1991: t.p. (Steven Bernstein)</note> ... </mads> </madsCollection>
Sharing our Metadata • When our metadata is encoded in an XML schema, we can share it more easily through XSLT stylesheets that convert our library metadata into other standards such as the Resource Description Framework (RDF), the foundation for the Semantic Web. It is more clearly identifiable as metadata and can be more easily harvested. • We can also more easily benefit from the metadata of others for our own use.
Sharing our Metadata MARCXML MODS Many details lost to simplification Is more easy for English-speaking non-librarians to understand Structured much more simply • Preserves all the detail of the metadata • Despite using the universal language of numbers for tag names, is difficult for non-librarians to understand • Uses complex structures to maintain robustness of MARC21 Markup of Markup… Round Hole; Square Peg
Essential Conversion Tool • MarcEdit by Terry Reese can convert your metadata between many Library XML schemas. • Version 5.8.4698.40412 was just released this week! http://people.oregonstate.edu/~reeset/marcedit/
XML: Where are We Going? Which Library Schemas will be the Future?
Some Resources • World Wide Web Consortium (W3C)http://www.w3.org/ • W3Schools XML Tutorialhttp://www.w3schools.com/xml/ • Library of Congress Metadata Standardshttp://www.loc.gov/standards/ • The Official M.C. Escher Websitehttp://mcescher.com/