400 likes | 532 Views
Millennium and XML: Repurposing and Customizing Metadata. Lucas Mak and Dao Rong Gong Michigan State University. May 17 - 20, 2009. Today’s Outline. Overview of Metadata Millennium system and XML Overview of XSLT Case Studies Sunday School Books Collection New Book List
E N D
Millennium and XML: Repurposing and Customizing Metadata Lucas Mak and Dao Rong Gong Michigan State University May 17 - 20, 2009
Today’s Outline • Overview of Metadata • Millennium system and XML • Overview of XSLT • Case Studies • Sunday School Books Collection • New Book List • Conclusions and Observations
Metadata • Structured data or information about an information resource. • Types of metadata: • Descriptive • Administrative/Rights • Preservation • Technical • Structural
Descriptive Metadata • Popular descriptive metadata standards • Dublin Core (Simple & Qualified) • MODS • MARCXML • VRA Core • IEEE LOM • TEI Header • EAD
Innovative XML • XML records from Millennium • Retrieved through HTTP query • Data arrangement based on MARC fields • But MARC field and its subfields are siblings • Optimized for WebPAC display • Brief record (for search result index page display) • Contains data from MARC 245, Publication year, record ID • Full record (for both public and staff MARC display of individual record)
Public display Staff MARC display
Millennium System and XML Millennium MARC Metadata Builder /xrecord XMLServer XML OAIHarvester Delimited Text Content Pro
XML Server XML server query string (search for title “xslt”): http://magic.msu.edu/xmlopac/?xml=<WXREQ_ROOT><KEY>txslt</KEY></WXREQ_ROOT>
XSLT • Extensible Stylesheet Language Transformation • Current version: 2.0 • “Transformation” means: • Manipulation of XML documents by creating a new document based on the original document • We recommend against multiple bullet indents • Usages in library context: • Crosswalking • Data selection and manipulation • Web display • Example: converting EAD into HTML for web display
XSLT • Uses XPath expressions to select/filter data node • By name of “Element” • <xsl:for-each select="marc:leader"> • By value of “Element” and/or “Attribute” • <xsl:for-each select="marc:datafield[@tag=650 and @ind2='0']> • <xsl:if test="$leader7='c'">
Case Study One • Sunday School Books Collection • 19th century publications by religious societies • 170 titles digitized and cataloged • Data conversion needs • Source: Millennium • Target: Content Pro • Conversions in: • Format: .marc to XML • Schema and Data Structure: MARC to Qualified Dublin Core
Options for Data Migration Create Lists MARCFile MARCXML MARCEdit Millennium Content Pro(QDC) XSLT MARCEdit HTTPQuery InnovativeXML
Segment of Innovative XML Field indicator asvalue of element MARC field/subfield as value of element Siblings
Segment of MARC21XML MARC field/subfield as value of element attribute Field indicator asvalue of element attribute Parent-Child
Segment of MARC21XML • Issues with Innovative XML data conversion needs • Data structured differently from MARC21XML • Availability of existing “Innovative XML to DC/QDC” XSLT? • Not optimized for data manipulation • Complications in data selection • Selection of data node by matching criteria against values in individual elements • A series of matching may be needed for selecting just one node • Efficiency in processing • Multiple upward, downward, and lateral movement involved in data selection
Final Path of Data Migration Create Lists MARCEdit Millennium (.marc) Content Pro(QDC) MARCFile MARCXML XSLT MARCEdit
Design of XSLT • Based on LC’s “MARC To Simple DC” XSLT • Customized mappings according to LC’s suggestions • Crosswalking strategies • Conditional processing (i.e. matching) • boolean ( ), contains ( ), starts-with ( ) • <xsl:if>, <xsl:choose>, <xsl:when> • String manipulation • Used in both conditional processing and data selection for output • substring ( ), substring-before ( ), substring-after ( ), translate ( ), concat ( ), normalize-space ( )
Design of XSLT • Conditional Processing & String Manipulation in De-duplication <xsl:for-each select="marc:datafield[@tag=246]/marc:subfield[@code='a']"><xsl:if test="not(contains($dataField245Lower, translate(substring(normalize-space(.),1,string-length()-1), $upperCase,$lowerCase)))"><xsl:element name="dcterms:alternative"><xsl:value-of select="normalize-space (substring(.,1,string-length()-1))"/> </xsl:element> </xsl:if> </xsl:for-each> Compare MARC 246 against MARC 245 Chop trailing period (.) Converts 245 & 246 into lower case before comparing
Design of XSLT No <dcterms:alternative> for MARC 246
Design of XSLT • Predicate • Used for data selection and de-duplication • <!-- Output MARC 650y as <dcterms:temporal> --> • <xsl:for-each select="marc:datafield[@tag=650 and @ind2='0'] • [not(marc:subfield[@code='y'] = preceding-sibling::marc: • datafield[@tag=650 and @ind2='0']/marc:subfield[@code='y'])]/ • marc:subfield[@code='y']"><xsl:element name="dcterms:temporal"><xsl:value-of select="normalize-space(self::node())"/></xsl:element></xsl:for-each> Selects unique 650$y only Selects LCSH only
Design of XSLT • Hard-coding • Inserted elements that are global to all records <!-- Output <dc:format>application/pdf</dc:format> --><xsl:element name="dc:format"><xsl:text>application/pdf</xsl:text></xsl:element>
Case Study Two • Library’s book lists • Issues with featured list
Case Study Two • Existing New Book List • Newly cataloged books for browse shelf • New approach using XML and XSLT • New features design • Sorting • RSS feed • Customization
New Book List Based on XML File • Millennium XML server outputs two files • Entire new book list over a rolling period of time • List of daily added books • New Book List program output • Book List in HTML format • RSS feed for daily added books
Path of Data Processing Internet EXPECT XSLT Web Server & php Millennium XML output
Observations and Challenges • Millennium System and XML • XSLT processor within Millennium and customizing Innovative XML output • Using XML as data source • Large XML file size • XSLT and data processing • XSLT data manipulation • Lack of built-in functions for conditional data looping etc.
Thank you! makw@mail.lib.msu.edu gongd@msu.edu