170 likes | 271 Views
Open Past: Digital Projects from Government Libraries. Finance Canada Statistics Canada Library of Parliament June 1, 2012. CLA Conference 2012.
E N D
Open Past: Digital Projects from Government Libraries Finance Canada Statistics Canada Library of Parliament June 1, 2012
CLA Conference 2012 “Share your thoughts with fellow delegates and CLA members while attending the conference. The twitter hashtag is #CLAOTT2012 or you can blog or "Facebook" from CLA 2012 in Ottawa. Go to the CLA website, http://www.cla.ca/conference/2012for the CLA from Away links.”
Overview • Introductions • Finance Canada • Statistics Canada • Library of Parliament • Questions
Finance CanadaDigitizing the Federal Budget Eileen Bays-Coutts Iona Henderson June 1, 2012
Library Digitization Goals • To increase Web access to and discoverability of federal budget publications • To address service delivery issues • Pilot: To assess digitization, repository, and metadata requirements.
Pilot Phase • March 2010, digitized the 1952 to 1994 Speech, Plan, and Budget in Brief publications. • Used in-house photocopier and casual staff. • Publications scanned to PDF and files optimized using Adobe Acrobat Pro OCR and tagging processes.
Pilot Continued • Sample of OCR coding errors underlying PDFs: I am honoured, Madam Speaker, to have the opportunity to present to Parliament the first b6'dget of this new decade. It is a bUdget which sets new directions for the economy~directionswhich willensure both energy security and economic securit'y for Canadians in the years ahead. It would b~no service to this House, nor to Qanadians, to deny that there is a deeply troubling air of uncertainty and anxiety around the world and, I am sure, in the hearts and minds of Canadians; we have inherited many difficulties from the decade of the 70s. But It would be just as wrong to deny that the decade of the 80s provides extraordinary oppo.rtunities for Canada and Canadians.
Pilot Continued • Results: • Low cost • Crawlable and searchable files • 3% to 5% OCR error rate. • Conclusion – error rate unacceptable.
Project Phase • Goal to produce CLF2 compliant, 99.5% error-free OCR text • Work competitively outsourced in 2010/11 to Terra Reproductions • Same scope as Pilot phase.
Project Continued • Full specs were provided to the company including generic metadata; metadata to be enhanced later. • Results: • error rate of 0.5% or lower • But discovered some gaps
Getting to the Web Add: 1968 to 1994 Enhance user experience 2007 to 2012 budget.gc.ca 1995 to 2006 fin.gc.ca
Getting to the Web Continued • Additional metadata added to files • Prime Minister • Finance Minister • Parliament number • Political party • Became our filtering criteria + the year
Getting to the Web Continued • JQuery used for sorting functionality • Some browser issues with display so custom style sheets developed • Clean up of 1995 – 1999 PDFs on FIN
Final Product! www.budget.gc.ca/pdfarch/index-eng.html
Going Forward • Fill gaps in collections • Enhance metadata • Improve layout and functionality • Add additional PDF documents from years 1994 – 2011 • Improve accessibility of PDFs
Thank you • Eileen.Bays-Coutts@fin.gc.ca • Iona.Henderson@fin.gc.ca • http://www.budget.gc.ca/pdfarch/index-eng.html