100 likes | 211 Views
P ort T ownsend L eader H istorical N ewspaper A rchive Keith Darrock. H IS TORY. PORT TOWNSEND LEADER. Original paper began in 1889 with indexing for digital repository completed for 1903 -1913 >>. Schema >> Dublin Core. Issue & headlines: April 1, 1910 Page three
E N D
Port Townsend Leader Historical Newspaper Archive Keith Darrock
H IS TORY PORT TOWNSEND LEADER • Original paper began in 1889 with indexing for digital repository completed for 1903 -1913 >>
Schema >> Dublin Core Issue & headlines: April 1, 1910 Page three May Roberts company to open engagement tonight Cubs and soldiers to play practice game Schooner Inca is coming to Puget Sound Keywords: drama; baseball; Named individuals: Roberts, May Rassmussen, Captain Vessels: schooner Inca; Publisher: The Leader Company Place of publication: United States--Washington (State)--Port Townsend Type of publication: Newspaper Frequency: Daily except Monday Title notes: The Port Townsend daily leader (1904-1916); Continues: Morning leader. Continued by: Port Townsend leader (1916). Image format: GIF image Scanning data: Scanned from 35 mm silver negative microfilm by OCLC Preservation Resources; GIF images are 1600 pixels wide, type 89A with 2 added colors derived from 600 dpi bitonal TIFF images. Source of other formats: Microfilm: Port Townsend Public Library, 1220 Lawrence St., Port Townsend, WA 98368, 360-385-3181. Microfilm and bound originals: Jefferson County Historical Society, 210 Madison St., Port Townsend, WA 98368, 360-385-1003. Rights: Use of this image is restricted to non-commercial, public access and does not include the right to create text versions. Example taken from: http://content.lib.washington.edu/cgi-bin/viewer.exe?CISOROOT=/ptleader&CISOPTR=3978
Content Standards • MIG: Metadata Implementation Group >> http://www.lib.washington.edu/msd/mig/default.html • Provides guidelines for creating a collection >> • Dublin Core Field Properties Table >> http://www.lib.washington.edu/msd/mig/advice/default.html • Date field mm/dd/yyyy • Issue & headlines • Vessels • Named individuals • Keywords • Page notes • Leader historical archive does not follow strict content standards in terms of controlled vocabulary >>
Digitization Standards • Put onto microfilm as a Washington State Library project >> • Scanned by OCLC into images >> • Images originated in 600 dpi TIFF format >> • Finalized as 1600 pixels wide GIF format>> • Uploaded to UW servers via CONTENTdm clients
Harvested into a Federated Search Tool? • The Port Townsend Leader archive has not been harvested by OAIster yet… • However, many collections within the UW digital collection have • The Port Townsend Leader archive can be found in OCLC’s CONTENTdm Collection of Collections http://collections.contentdm.oclc.org/
Software Used to House Records & Digitized Works • CONTENTdm>> • Originally developed by the University of Washington >> • 2001 Digital Media Management, Inc was created. System made available to outside entities >> • OCLC purchased in 2006, now owns and manages.
Who’s Responsible for Indexing? • The Port Townsend Public Library manages volunteer(s) to hand index certain fields. Including: • Issue and headlines • Keywords • Named individuals • Vessels • Using a controlled vocabulary? Sometimes, including the first three years 1903-06 and sporadically there after. Actual LCSH headings, probably not. • *Automation will not solve the need for human indexing within the date range and subject fields
Automation – Can We Do It? • In over ten years, human indexing has only completed ten years of content. However, this is a lot of work, over 7,000 images so far! >> • Need a more efficient solution? >> • Use OCR (ABBYY FineReader) software to extract the image’s text in batches >> • Add new field >> Text, that contains all text (searchable-YES) • Two files; image & OCR (text) combined via CONTENTdm>> upload all to UW main server >>
Challenges to Automation >> • Working with volunteers, need library staff involvement >> • Making compound or complex objects >> • Still need subject terms & date applied by a human indexer >> • Having volunteers use an actual controlled vocab. >> • Time to do it all? I Automation >>