1 / 39

Working With Digital Archives at the Harry Ransom Center

Working With Digital Archives at the Harry Ransom Center. A Presentation About Processing the Digital Archives of British Playwright Arnold Wesker Metadata and Digital Object Roundtable Society of American Archivists Annual Meeting 2007 Catherine Stollar Peters New York State Archives.

kita
Download Presentation

Working With Digital Archives at the Harry Ransom Center

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Working With Digital Archives at the Harry Ransom Center A Presentation About Processing the Digital Archives of British Playwright Arnold Wesker Metadata and Digital Object Roundtable Society of American Archivists Annual Meeting 2007 Catherine Stollar Peters New York State Archives

  2. Background Worked at Harry Ransom Center in Austin, Texas from 2004 to early 2007

  3. Austin

  4. Albany

  5. Background Now work at the New York State Archives Cultural Education Center (New York State Archives)

  6. In January 2007 the Ransom Center was • Processing collections with electronic records • Developing policies and procedures for processing electronic records • Evaluating options for a Trusted Digital Repository • At the School of Information at the University of Texas at Austin • At the University Libraries at the University of Texas at Austin • Or develop institutional TDR • Conducting a general electronic records survey and needs assessment (with a more thorough survey planned for the fall)

  7. HRC Dspace at School of Informationhttps://pacer.ischool.utexas.edu/handle/2081/288

  8. About the Case Study

  9. In January 2007 at the School of Information • Dr. Patricia Galloway offering Problems in Permanent Retention of Electronic Records Course • Dr. Galloway contacts Ransom Center for potential support of group projects

  10. School of Information Course Three collections were processed by students during Spring 2007 semester • Leon Uris Papers • Lessons in digital archeology • Limited migrated content • John Crowley Papers • Standard manual processing • Arnold Wesker Papers • Largely automated processing, migration, ingest procedures • Fragile media • Living author

  11. School of Information Course Three collections were processed by students during Spring 2007 semester • Leon Uris Papers • Lessons in digital archeology • Limited migrated content • John Crowley Papers • Standard manual processing • Arnold Wesker Papers • Largely automated processing, migration, ingest procedures • Fragile media • Living author

  12. Arnold Wesker • British playwright and author • Born in London in 1932 • The Four Seasons ran in March 2007 at Arcola Theatre • Ransom Center maintains paper archives • Works include • As Much as I Dare (autobiography) • Longitude (adaptation of Dava Sobel’s book) • Groupie • Chips with Everything

  13. Automated Processing Largely automated processing, migration and ingest procedures possible because • One author • Similar content/materials (works, correspondence, diaries, personal files) • Mostly same format (Corel WordPerfect 5.0, 9.0 and Microsoft Word 97 and 2000) • Easily migrated (to RTF) • Well arranged • Manageable number of files (5,000 +) • Readable disks (75 3.5 inch floppies and 1 zip disk)

  14. Processing Issues • Some files were password restricted • Bank account numbers were included • Encoded date fields would automatically update

  15. Archival Theory Applied to Digital Materials

  16. Archival Theory Applied to Digital Materials

  17. Disk Catalog

  18. File Catalog

  19. Appraise for Duplicates • Files on zip disk contained some duplicates • Developed rules for removing duplicates to prevent automatic deletion of duplicate names but not duplicate files • Erased duplicate files but recorded presence of duplicates in file catalog • Zizasoft’s comparison software zsCompare and zsDuplicate Hunter Standard 2.31

  20. Restricted Material • Bank Account numbers • Investigate to see if the accounts were closed • Password protected diary entries • Remove password to migrate • Place restrictions on access through DSpace instead of word processing software • Paper copy already exists and is in restricted section of stacks

  21. Checksums • Command line utility automatically creates checksum • Jacksum is one Java checksum utility • Export results to spreadsheet • Compare to MD5 hash created by DSpace

  22. Migrate Text to More Stable Format • Chose RTF because it is widely accessible by multiple readers and it retains formatting • ODF is new and untested yet • TXT loses formatting • Microsoft Word DOC and Corel WordPerfect WPD are proprietary and accessible by few readers • Used ABC Text Converter to migrate files from DOC or WPD into RTF • Used Perl script to add extensions to files to mitigate Wesker’s use of 3 digit extension

  23. Create Duplicate Physical Copy • Save files to CD, DVD or harddrive for extra, short-term backup copy while processing (and before ingest into Institutional Repository)

  24. Extract Metadata

  25. National Library of New Zealand XML

  26. National Library of New Zealand XML (cont.)

  27. Dublin Core XML

  28. Directory Arrangement for DSpace Bulk Ingest

  29. Automated Processes • Created Perl scripts to automate processing • Modified Perl scripts from Queen’s University Library in Ontario, Canada http://library.queensu.ca/webir/qspace-project/tutorials/qspace_bulk_upload.doc • Metadata conversion script (from National Library of New Zealand Metadata Extraction Tool v 3.0) • Script to move individual xml files into individual directories • Script to create contents file for each directory • Scripts to rename files for format transformation

  30. Issues with Metadata Extraction • Author unreliable • Partially solved by adding code to Perl scripts to export standard author information) • No subject metadata • Inaccurate dates • Date created sometimes newer than date modified due to Windows file system • Inaccurate titles • First line in document • Title from template • Format problems when extensions are used as part of name field • No recipient information (potential text mining project) • Path name derived from location of file on processing computer, not original author’s system • Sometimes NLNZ Metadata Extractor v 3.0 processes files with default adapter instead of actual suitable adapter • Dublin Core metadata is not robust enough for digital preservation needs

  31. New Zealand XML Wrong Author

  32. Dublin Core XML

  33. Ingest Created detailed ingest procedures based on • Cornell’s ecommons@Cornell procedures as example • DSpace instructions

  34. Takeaways • More automated tools • Toolkit to aggregate tasks • Better metadata extraction potential • Support of more schemas

  35. MetaTools--Investigating Metadata General Tools • JISC funded grant project undertaken by the Arts and Humanities Data Service, King’s College London • 18 month project, ends September 2008 • Project goals • Develop a methodology for evaluating metadata generation tools • Compare the quality of currently available metadata generation tools (including NLNZ Metadata Extractor, Droid, Jhove) • Develop, test and disseminate prototype web services that integrate metadata generation tools.

  36. Student Publication Lorraine Dong, Megan Durden and Sarah Kim Presented Silicon Chips with Everything: Preserving Arnold Wesker’s Digital Manuscripts at SSA 2007 https://pacer.ischool.utexas.edu/handle/2081/2322 (Look for their forthcoming publication)

  37. Contact Information Catherine Stollar Peters New York State Archives Cultural Education Center Albany, New York 12230 cstollar@mail.nysed.gov (518)486-7820

More Related