140 likes | 227 Views
A Framework for Publishing Oral History Interviews to the Web. Stephen Paul Davis Director, Libraries Digital Program Columbia University OCLC Western Digital Forum August 2006 rev. 10/2011. The Players. Columbia's Libraries Digital Program
E N D
A Framework for Publishing Oral History Interviews to the Web Stephen Paul DavisDirector, Libraries Digital ProgramColumbia University OCLC Western Digital ForumAugust 2006rev. 10/2011
The Players • Columbia's Libraries Digital Program • Columbia Center for Oral History (formerly: Oral History Research Office) • Columbia's Digital Knowledge Ventures (ceased operations) • Backstage Library Works (formerly: OCLC Preservation Services) • George Blood, L.P. (formerly: Safe Sound Archive) • OCLC Digital Archive
The Characters • Bennett Cerf – publisher • Kenneth Clark – psychologist, social activist • Mamie Clark – psychologist, social activist • Moe Foner – labor activist • Andrew Heiskell – publisher • Edward I. Koch – political figure • Mary Lasker – philanthropist • John B. Oakes – newspaper editor • Frances Perkins – political figure • Frank Stanton – leader in broadcasting
The Script • Sessions: 10 interviewees in 193 individual interview sessions • Recordings: 205 hours on 170 Tapes (109 Cassettes, 53 Five-inch Reels, 8 Seven-inch Reels) • Transcriptions • 11,064 pages of typescript in 72 notebook binders • 2,644 pages in MS Word format • Related material: name indexes, biographies, tables of contents, photos
The Plot • Online audio in Real & MP3 format, both downloadable & streaming • Audio segments directly correlated with transcriptions at the paragraph level • Page images of transcriptions in PDF • OCR'd transcriptions plus TEI/XML mark up • Full-text search and retrieval • Name index entries linked back to references in text • Abstract of each interview • A general introduction • A few pictures • Rights and permissions cleared in advance
The Revised Plot • Online audio in Real & MP3 format, both downloadable & streaming • Audio segments directly correlated with transcriptions at the paragraph session level • Page images of transcriptions in PDF • OCR'd Re-keyed transcriptions plus TEI/XML mark up • Full-text search and retrieval • Name index entries linked back to references in text • Abstract of each interview • A general introduction • Three general introductory essays & a video interview with ORHO director emeritus • Ten introductions for the interviewees • A few50 pictures • Ten new, detailed tables of contents • Ten audio & text 'excerpts' to provide interview lead-ins • Rights and permissions cleared in advance • Dropped:Robert F. Wagner, Kitty Carlisle Hart, Alice Hartley Neel, Schuyler Garrison Chapin, Ed Koch (1997) • Almost dropped: Foner (bad language) • Added: Mamie Clark, Mary Lasker, Frances Perkins, John Oakes
Cataloging & Metadata Cataloging options: • Audio: the original audio collection, the complete wav files, the complete MP3 files, the segmented Real files • Transcriptions: the original typescripts and/or Word files; the converted XML files; the generated HTML files Cataloging decisions • Previous catalog records for oral history transcripts left intact under “Reminiscences of …” • New collection-level catalog record created for entire NNY site • New “analytic” catalog records created for each Notable New Yorker subsite as a component of the NNY collection site: 773 0_ |7 nnbc |a Notable New Yorkers |h [electronic resource]. |w (OCoLC65181290)
Ticket Prices • Scanning, keying & XML Markup: $12,200 • Audio transfers, file header edits, MP3 creation & media: $13,720 • Audio time coding & post-processing: $9,000 • Web site (outsource): $17,150 • Pre-production, $2,600 • Rights research & permissions, $1,000 • Web site design, $3,850 • Web programming, $7,500 • Copy editing & QA, $1,400 • XSLT Generation of HTML from METS/TEI, $2,000 • Additional site content: $12,800 • Introductory Essays, $5,700 • Tables of Contents, etc. $5,900 • Video shoot & post-production, $1,200 • Oral History Research Office Contributions: "Priceless" • Text preprocessing • Audio inventory • Rights and permissions clearances • Editorial review • Digital Library Program Contributions: “Ditto” • Project and vendor coordination • Text QC, post-processing, METS file creation • Text indexing & retrieval system (Lucene) • Application integration
Challenges 1 Problems with Rights & Permissions • Permission status uncertain • Permission withdrawn • Permission equivocal Problems with Source Material • Incomplete / outdated inventory of original media • Missing tapes, audio files • Patrons using only (single) copy of transcripts • Misnumbered pages in transcriptions • Missing pages in transcriptions Scanning & Keying Vendor / Digital Program Relations • Novelty of / unfamiliarity with oral history content • Delays in providing vendor with source material • Recognition that typescripts could not be OCR’d because of poor quality; instead 100% rekeying of originals • Clarity, interpretation, accuracy of markup specs
Challenges 2 Web Design Vendor / Digital Program Relations • Outsource design of a web site intended to be maintained afterwards in-house; • Differences in development process, methodology • Difference in “one shot” site versus ongoing collection-driven site • Differences in design “values,” e.g., aesthetics versus usability; “teaching & learning” ethos versus “easy & effective access” ethos; role of branding; • Differences in familiarity and experience with full-text / cross-text search and retrieval • Availability of time to meet & discuss issues, project management by email, deadlines, Curatorial / Digital Program Relations • Curatorial time and staffing constraints • Curatorial enthusiasm leading to requirements creep • Assumptions about feasibility of “last minute changes” Textual Issues Identity of the “master file” after online publication? • “Fixity” of transcriptions in MS Word • Retaining consistency of references / citations in paper version and in online version
Challenges III Issues Relating to the Practice of Oral History • Publishing oral history interviews reflecting older, “outdated” practice along with those reflecting current practice • Making available original, unedited audio files in conjunction with transcriptions reviewed & edited by the interviewees • Web exposure of interviews that were originally to be available onsite to scholars and researchers • Influence on current and prospective interview subjects who know that their comments will be published on the Web
The Moral (Lessons Learned) 1 • Commit to doing more planning up front than you think you need to do; • Set up a rigorous schedule of face-to-face meetings with key stakeholders even if they don't think you need to; • Make sure all content pieces are agreed to, in hand, fixed, and have clear permissions to publish before agreeing to do the project (or at least before contracting with vendors); • Oral Histories are by their nature fuzzy in their fixity; • Widows often object to their husbands' bad language long after their husbands are gone; • Keep detailed inventories of all content pieces before, during and after the project (good asset management); • Enthusiasm can often lead to scope creep;
The Moral (Lessons Learned) II • Push off non-essential scope creep to Phase 2; • Don't try to edit Emeritus' prose; • Many people don't like Realmedia / RealPlayer any more (I blame Microsoft); • Curators often have other things to do than what you're interested in having them do; • Library Digital Program staff always have other things to do than the project the curator is interested in; • If a Digital Project is successful it becomes a permanent part of your life and will always need care and feeding even if you think you're finished with it, so get used to it; • There are less expensive ways to do projects like Notable New Yorkers but not that much less expensive.