1 / 21

METS and TEI

METS and TEI. Richard Gartner Oxford University. Introduction (verbal). METS provides framework within which any data or metadata can be referenced or embedded This presentation shows how easily METS and TEI can be used in tandem

xylia
Download Presentation

METS and TEI

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. METS and TEI Richard Gartner Oxford University

  2. Introduction (verbal) • METS provides framework within which any data or metadata can be referenced or embedded • This presentation shows how easily METS and TEI can be used in tandem • The context is an image database with full OCR’d text encoded in TEI

  3. Cobbett’s Parliamentary History

  4. Incorporating TEI into METS <fileGrp ID="modhis006-aab-TEI"> <file GROUPID="TEI" MIMETYPE="text/xml" ADMID="modhis006-aab-001-TEI"> <FLocat LOCTYPE="URL“ xlink:href="modhis006-aab.xml"/> </file> </fileGrp>

  5. Incorporating TEI into METS <div ID="modhis006-aab-div.1.1.1" LABEL="Half page"> <fptr FILEID="modhis006-aab-fgrp-0001"> <area FILEID="modhis006-aab-TEI" BEGIN="modhis006-aab-TEI.pb.1“ END="modhis006-aab-TEI.pb.2"/> </fptr> </div>

  6. Incorporating TEI into METS

  7. OCR -> TEI • TEI in Libraries level 1 – simplest level of encoding designed for OCR texts • One <div> element enclosing complete text • One <p> element within this • Page breaks marked with <pb>

  8. OCR -> TEI (verbal) • OCR’d text put into skeletal TEI file with minimal header • Page-breaks in file replaced with <pb> • A simple stylesheet assigns a sequential ID to each <pb> • Another stylesheet adds <area> elements to METS structural map pointing to <pb> elements

  9. Put your OCR text here! <?xml version="1.0" encoding="utf-8"?> <tei.2> <teiHeader status="new" type="text"> <fileDesc> <titleStmt> <title>modhis006-aab OCR text</title> </titleStmt> <publicationStmt> <publisher>Oxford Digital Library</publisher> </publicationStmt> <sourceDesc default="NO"> <p >OCR text from modhis006-aab</p> </sourceDesc> </fileDesc> </teiHeader> <text> <body> <div0 id="modhis006-aab-aaa.div.1" part="N“ sample="complete" org="uniform"> <p> </p> </div0> </body> </text> </tei.2>

  10. □Parliamentary History. VOL. n. □ <pb/>Parliamentary History. VOL. n. <pb/> <pb/>Parliamentary History. VOL. n. <pb/>

  11. <xsl:template match="//pb"> <xsl:element name="pb"> <xsl:attribute name="id"> <xsl:value-ofselect="$idstem"/> .pb. <xsl:number count="pb" format="1“ level="any"/> </xsl:attribute> </xsl:element> </xsl:template> <pb id="modhis006-aab-aaa.pb.1"/>Parliamentary History. VOL. n. <pb id="modhis006-aab-aaa.pb.2"/>

  12. <xsl:element name="fptr"> <xsl:attribute name="FILEID"> <xsl:value-of select="@FILEID"/> </xsl:attribute> <xsl:element name="area"> <xsl:attribute name="FILEID"> <xsl:value-of select="$idstem"/> </xsl:attribute> <xsl:attribute name="BEGIN"> <xsl:value-of select="$idstem"/> .pb. <xsl:number count="mets:fptr" format="1" level="any"/> </xsl:attribute> <xsl:attribute name="END"> <xsl:value-of select="$idstem"/> .pb. <xsl:value-of select="$currentcount+1"/> </xsl:attribute> </xsl:element>

  13. <div ID="modhis006-aab-div.1.1.1" LABEL="Half page"> <fptr FILEID="modhis006-aab-fgrp-0001"> <area FILEID="modhis006-aab-TEI" BEGIN="modhis006-aab-TEI.pb.1“ END="modhis006-aab-TEI.pb.2"/> </fptr> </div>

  14. Why use METS and TEI together? • Images • Overlapping hierarchies

  15. Verbal • Images • AS far as P4, TEIs image facilities clumsy • Have to use entity references only – no URLs URIs etc • No way to distinguish between inline images (designed for these) and whole-page images • No scope for administrative metadata • Overlapping hierarchies • CONCUR was SGML mechanism for this – clumsy to use and gone in XML – various other approaches all distinguised by notational complexity

  16. Images <figure entity=“page1”> <head>Page 1</head> </figure> <ENTITY page1 SYSTEM “location_of_image_file” NDATA jpeg>

  17. Overlapping hierarchies • Some approaches used with TEI • CONCUR (SGML) • MECS (Wittgenstein archive) • Stand-off markup: XLink mechanisms to impose markup (varying hierarchies) • TexMECS • Witt: PROLOG

  18. Images in METS • List all variants of image files in <fileSec> • Each can have extensive administrative or descriptive metadata attached • Reference them by URLs, URIs etc or embed them in the METS file • FILEID element in <structMap> indicates exact correspondence of image to part of the item

  19. Overlapping hierarchies <structMap type=“physical”> <div LABEL=“Page 1”> <fptr FILEID=“image_file_for_page_1”> <area FILEID=“teifile” BEGIN=“page1” END=“page2”> </fptr> </div> </structMap> <structMap type=“logical”> <div LABEL=“Chapter 1”> <fptr FILEID=“image_file_for_page_1”> <area FILEID=“teifile” BEGIN=“page1” END=“page23”> </fptr> </div> </structMap>

  20. Overlapping hierarchies <structMap > <div LABEL=“Chapter 1”> <div LABEL=“Page1”> <fptr FILEID=“image_file_for_page_1”> <area FILEID=“teifile” BEGIN=“page1” END=“page2”> </fptr> </div> </div> </structMap>

  21. More information • http:www.loc.gov/standards/mets • http://www.jisc.ac.uk/index.cfm?name=techwatch_report_0205

More Related