1 / 25

PDF and Long Term Preservation

PDF and Long Term Preservation. May 17, 2005 Susan J. Sullivan, CRM susan.sullivan@nara.gov. Introduction. Today’s presentation will discuss NARA’s work to address the long term preservation of electronic documents in Portable Document Format (PDF)

Download Presentation

PDF and Long Term Preservation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. PDF and Long Term Preservation May 17, 2005 Susan J. Sullivan, CRM susan.sullivan@nara.gov

  2. Introduction • Today’s presentation will discuss NARA’s work to address the long term preservation of electronic documents in Portable Document Format (PDF) • Explain why long term preservation of electronic documents in PDF is an issue • Describe the draft PDF/A ISO Standard in the context of NARA’s PDF Transfer Guidance for permanent records in PDF…including NARA’s expectations for PDF/A • Explain the roles of both PDF/A and the PDF Transfer Guidance in Federal recordkeeping • Provide an overview of PDF/A and its status in the ISO Process • Quiz at the end (group participation)

  3. Background – Wide Use of PDF • PDF is a digital format that electronically reproduces the visual appearance of documents whether they are: • Converted from other electronic formats, or • Digitized from paper or microform • Businesses, governments, libraries, archives, and other institutions and individuals around the world use PDF to: • Collect and disseminate information over the Internet, • Store electronic records, and/or • Make scanned images searchable by embedding OCR’d text. • As a result, large bodies of important information are maintained in PDF.

  4. Background – PDF Not a Suitable Archival Format • PDF itself is not suitable as an archival format. • Adobe is under no obligation to continue publishing the specification for future versions • Can include features incompatible with current archival requirements • Encryption • Embedded files • PDF documents not necessarily self-contained • Can depend on system fonts and other content drawn from outside the file • Multiple PDF development tools on the market • Inconsistency in the file format (all PDFs are not created equal) • Long-term solution needed to ensure that digital PDF documents remain accessible for long periods of time • Permanent archival records, in some cases • Administrative Office of U.S. Courts initiated idea for PDF/A

  5. Background – Example Business Case for Long Term Preservation of PDF • Administrative Office of the U.S. Courts (AOUSC) • Uses PDF as the electronic format for Electronic Case Filing System • System accepts filings and provides access to filed PDF documents over the Internet • Many AOUSC files must be maintained for long periods of time (e.g., 40 years) • Some will be transferred to the National Archives for permanent retention • Future use of and access to the AOUSC’s PDF documents depends on maintaining the ability to reproduce their visual appearance and other properties over the long term (i.e., across multiple generations of technology)

  6. Background - How NARA is Addressing PDF Issued PDF Transfer Guidance • NARA partnered with Federal agencies and issued guidance allowing transfer of permanent records in PDF to NARA (March 2003) • Part of Electronic Records Management E-Gov Initiative • Agency partners identified PDF as one of six priority records transfer formats Participating in PDF/A ISO Standard Development • NARA is participating in PDF/A development… • To influence the process so that PDF/A compliant records can be preserved by NARA over the long term, and • To provide information used in developing/maintaining NARA guidance for transferring permanent records in PDF

  7. Background - Transfer Format versus File Format Goal: To ensure that valuable electronic information in PDF is not lost Purpose: • Transfer Format - NARA’s PDF Transfer Guidance • Specifies requirements for transferring permanent records in PDF to NARA • Applies to existing and future records in PDF so that NARA can accept and process these records • File Format - The PDF/A ISO Standard (PDF/A) • Specifies a file format, based on PDF, that is more suitable than PDF for long term preservation • Will allow PDF records to be maintained longer as PDF (e.g., within agencies)

  8. Scope and Usage - NARA’s PDF Transfer Guidance • Scope • Applies to records scheduled as permanent • Supplements existing Federal regulations • Covers existing and future electronic records meeting transfer requirements, but…. • Unique circumstances • NARA will work with agencies through their Appraisal Archivist to ensure that valuable electronic records are not lost • Usage • Agencies will use NARA’s PDF Transfer Guidance toTransfer existing permanent PDF records to NARA

  9. Scope and Usage - PDF/A ISO Standard • Scope • Defines a file format based on PDF, that preserves the visual appearance of electronic documents over time • Provides a framework for recording and embedding metadata within PDF files • Defines a framework for representing the logical structure and other semantic information of electronic documents within PDF/A files • Usage • Vendors will use the PDF/A Standard to develop applications that read, write, and otherwise process conforming PDF/A files • Agencies will use PDF/A applications to create and process PDF/A conformant files • As part of their strategy for long term preservation of electronic records • In conjunction with PDF transfer guidance for transferring permanent records to NARA, as applicable

  10. Scope and Usage - Summary NARA’s PDF Transfer Guidance • Applies to records in PDF scheduled as permanent • Incorporates file format(s) (e.g., PDF 1.0 - 1.4), • Incorporates quality criteria, laws and regulations, transfer documentation, NARA contact information PDF/A ISO Standard • Addresses one aspect of the long term preservation of electronic records in PDF (i.e., file format) • Should be used as one component of an organization’s electronic archival environment • Implementation depends on: • Records management policies and procedures • Additional requirements and conditions necessary to ensure the persistence of electronic documents over time (e.g., including PDF Transfer Guidance). • Quality assurance processes necessary to verify conformance with requirements

  11. Requirements - PDF/A and NARA’s PDF Transfer Guidance Embedded fonts • PDF/A and NARA’s PDF Transfer Guidance both require that all referenced fonts be embedded • For documents created before 4/1/04, NARA accepts PDFs that do not have all fonts embedded (i.e., base 14 - resident in operating system) Encryption • PDF/A and NARA’s PDF Transfer Guidance both prohibit encryption • For documents created before 4/1/04, NARA accepts PDFs with encryption that does not prevent opening, viewing, printing

  12. Requirements - PDF/A and NARA’s PDF Transfer Guidance Special Features • PDF/A restricts special features • Embedded files, external links, Java Script • PDF/A promotes tagged PDF as a higher level of conformance • NARA’s PDF Transfer Guidance evaluates special features on a case-by-case basis at the time of scheduling • To evaluate recordkeeping implications and to ensure valuable records are not lost Metadata/Documentation • PDF/A requires that embedded metadata must be in Adobe eXtensible Metadata Platform (XMP) • NARA’s PDF Transfer Guidance requires transfer documentation (e.g., SF-258), and would evaluate embedded metadata during the scheduling process

  13. Requirements - PDF/A and NARA’s PDF Transfer Guidance Quality Requirements • PDF/A as a file format does not address quality/creation requirements • Includes recommended guidelines for exact replication of source material in Informative Annex B • Agencies must implement the guidelines of Informative Annex B to comply with NARA’s PDF transfer guidance • NARA’s PDF Transfer Guidance requires minimum scanning quality, prohibits lossy compression and substitution of bitmapped characters with OCR’d text

  14. NARA’s Expectations for PDF/A • PDF/A should address some existing archival issues with PDF and enable records in PDF to be maintained for longer periods of time in that format • Standard maintained by external international organization, not just vendors • Increased degree of format reliability/decrease in “bells & whistles” • Agencies will need to implement PDF/A in conjunction with records management policies and procedures and any additional requirements and conditions necessary to ensure the persistence of electronic documents over time • Examples • NARA’s PDF Transfer Guidance • AOUSC’s document management program

  15. PDF/A ISO Process – International Joint Working Group ISO Joint Working Group (JWG) - PDF/A TC/171* Document Imaging Applications TC/46 Information & Documentation TC/42 Photography TC/130 Graphics Technology TC/46 SC11 Archives/ Records Mgmt TC/171 SC 2 Application Issues TC/171 SC 2 WG-5 PDF/A PDF/A JWG * JWG formed under the auspices of TC/171

  16. PDF/A ISO Process – Progress and Next Steps • Early 2002 PDF/A development initiated • September 2003 Approval of ISO New Work Item (NWI) • October 2003 TC-171 Meeting - JWG prepared Committee Draft (CD) • November 2003/February 2004 - CD ballot circulated to National Bodies (NBs) • March 2004 - JWG reviewed NB comments on CD • June 2004/September 2004 - Second CD ballot circulated to NBs • October 2004 - JWG Meeting - JWG prepared Draft International Standard (DIS) • Winter/Spring 2005 - DIS Balloted to National Bodies • Unanimous affirmative votes - Goes to publication • Up to 25% negative Votes – Goes to FDIS, then 1 month ballot • Summer 2005 - TC-171 Meeting - JWG meeting to deal with DIS comments and discuss new work • Summer - 2005 International Standard/FDIS? • Software developers create PDF/A compliant applications

  17. Specifies requiredfeatures Specifies restricted features PDF 1.4 Reference Specifies prohibitedfeatures PDF/A PDF/A - Approach • PDF/A specifies: • The subset of PDF components, from the Adobe published specification for Version 1.4 (i.e., PDF 1.4 Reference), that are either required, restricted, or prohibited, and • How these components may be used by software to render the file

  18. PDF/A - Requirements • Prohibit or restrict features that could complicate long term preservation, and • Maximize the following PDF attributes: • Device independence • The degree to which a PDF/A file is independent of the platform on which it is interpreted and rendered • The degree to which a PDF/A file is amenable to direct analysis with basic tools, including human readability • Self-containment • The degree to which a PDF/A file contains all resources necessary for its reliable and predictable interpretation and rendering • Self-documentation • The degree to which a PDF/A file documents itself in terms of descriptive, administrative, structural, and technical metadata

  19. 1 Scope 2 Normative References 3 Terms and Definitions 4 Notation 5 Conformance Levels 6 Technical Requirements 6.1 File Structure 6.2 Graphics 6.3 Fonts 6.4 Transparency 6.5 Annotations 6.6 Actions 6.7 Metadata 6.8 Logical Structure 6.9 Interactive Forms Informative annexes Annex A - PDF/A-1 Conformance Summary Annex B - Best Practices for PDF/A Bibliography PDF/A - Table of Contents

  20. Annexes of the Draft PDF/A Standard – Informative Annexes • Informative Annexes will provide supplemental information including: • PDF/A-1 Conformance Summary • Summary tables of PDF objects and keys required, restricted and prohibited in PDF/A • Best Practices for PDF/A • Guidelines for capturing or converting electronic documents to PDF/A • For documents created according to specific institutional rules • Replicates the exact quality and content of source documents within the PDF/A file • Required for compliance with NARA’s PDF Transfer Guidance

  21. Two levels of conformance Level A (e.g., Tagged PDF, UNICODE Mapping) Level B (e.g. No Tagged PDF) Uniform file format (header, trailer, no encryption) Device-independent rendering of graphics Embedded fonts, character encoding Transparency prohibited Annotations restricted, content should be displayed by readers External actions restricted, no dependence on external content Readers not required to act on hyperlinks, but may XMP metadata “Adobe XML Metadata Framework” Forms based on appearance,not data PDF/A - Overview of Requirements

  22. Quiz - True or False? • The draft PDF/A ISO Standard… • Provides quality standards for converting electronic documents to PDF • False • Should enable electronic documents in PDF to be maintained longer as PDF • True • Is intended for use as one component of an organization's electronic archival environment for long-term retention of documents • True

  23. Quiz - True or False? • For permanent records in PDF, agencies need to understand that: • Records in PDF/A are guaranteed to be readable forever • False • PDF/A, by itself, does not guarantee exact replication of source material • True • Agencies must implement PDF/A in conjunction with additional requirements to meet NARA standards for transferring permanent records to NARA (i.e., NARA’s PDF Transfer Guidance) • True • Everyone is now excited to learn more about PDF….. • True!

  24. More Information is Available • More information on NARA’s PDF Transfer Guidance on NARA’s Web Site • http://www.archives.gov/records_management/initiatives/pdf_records.html • More information on PDF/A on AIIM Web Site • http://www.aiim.org/standards.asp?ID=25013 • Contact Susan Sullivan at susan.sullivan@nara.gov

  25. Questions/Discussion

More Related