350 likes | 357 Views
Learn about designing an efficient metadata approach for FDsys to enhance content accessibility and authenticity. Explore top-down goal setting, metadata collection methods, and the importance of preservation and authenticity.
E N D
GPO’s Federal Digital SystemMetadata collection, use, and display December 08, 2010
Scope of FDsys Content • Published Federal information products, regardless of format or medium, which are of public interest or educational value or produced using Federal funds. • Excludes administrative, operational, official use only, not of educational value, classified, constrained by privacy considerations, and self sustaining products.
What is FDsys? • FDsys is a Content Management System • FDsys securely controls digital content throughout its lifecycle to ensure content integrity and authenticity • FDsys is a Preservation Repository • FDsys follows archival system standards to ensure long-term preservation and access of digital content • FDsys is an Advanced Search Engine • FDsys combines extensive metadata creation with modern search technology to ensure the highest quality search experience
Designing a metadata approach • Top down approach: what are our goals and what information do we need to meet them? • Access: Make finding Federal government publications easy. Help users navigate the complex web of versions, issues, and related items. • Authenticity: Maintain content integrity and provenance. • Preservation: Ensure content is usable as information as technology changes. • FDsys collects, stores, uses, and shares metadata to support each goal
FDsys & Access: our goal Government publications present unique challenges to findability • Regulatory and legislative processes make even seemingly simple questions complex • Most users are looking for a specific piece of information much lower than the item level. • Content often repeats in new versions or issues with minor changes – using only full text search lots of results to wade through
FDsys & Access: collecting Descriptive metadata is collected in three ways. • Closest to the source: user interface at time of submission or any time after • Highest quality: bibliographic metadata created by information professionals via interface with our ILS (manual now, automated later) • Most automated: parsing data directly from the content
High-Level Information Flow Packages Raw Content Extract Metadata Group into Packages Metadata Content ContentDelivery Search CreateMODS Browse
Parsing Content Runs regular expressions to extract metadata • Regular Expression: (Public Law|Pub. L.|PL|P. L.) (1[0-9][0-9])-([0-9]+) • Content: Pub. L. 109-130 • MODS.xml <congress>109</congress> <number>130</number>
FDsys & Access: storing Descriptive metadata is stored in XML using the Metadata Object Description Schema (MODS) • Element set is richer than Dublin Core • Hierarchy allows for rich description, especially of complex digital objects • <extension> for local elements
FDsys & Access: using • Provide simple search with advanced results • Faceted searching: type “candy corn” into the search box, metadata allows you to restrict to an agency or a date range • Provide advanced search features so users can efficiently retrieve specific documents • Metadata allows quick retrieval by citation • Search metadata fields directly to retrieve specific results • “Related item” element in MODS allows us to build automated navigation between objects • Preserves context • article issue • issue next issue • Congressional Bill U.S. Code • Federal Register CFR • Makes government documents useable for the non-expert
FDsys & Access: sharing • Provide all descriptive metadata for an item or a granule (e.g., article) in MODS.xml • User-friendly display of commonly used elements
Internal Data Storage Web Application Data Mapping 110 V FR RULE congnum 2006-02-01 [2006-02-01;] accode=PLAWcongnum=110billnumber=1234publishdate=2008-01-01 110th Congress (2007-2008) Part V Federal Register Rules and Regulations Congress Number February 1st, 2006 After February 1st, 2006 Public and Private Laws. 110th Congress. H.R. 1234. January 1, 2008.
FDsys & Authenticity: our goal GPO and users need methods to verify that the content in the repository • Has not been maliciously or accidentally altered, • Has not been removed or added without authorization, and • Has been approved by, contributed by, or harvested from an official source.
FDsys & Authenticity: collecting • The system takes a checksum of each file uploaded to the repository • A record is generated of the time, date, and user, for each event that occurs to the content
FDsys & Authenticity: storing • Events and integrity information created according to the PREMIS data dictionary and stored in XML according to the PREMIS schema/data dictionary
FDsys & Authenticity: using • The system periodically regenerates the checksum for each content file in the system and compares to original value • Event logs serve as a provenance record • Tool for investigating unauthorized changes • Track new renditions created to preserve content to the original submitted to GPO
FDsys & Authenticity: sharing • Checksums and event record for publically available renditions are available on the website in XML • Full record for all renditions are stored in XML in the archive
FDsys & Preservation: our goal Digital preservation processes will: • Safeguard digital content and relevant metadata • Reduce reliance on hardware and software to access content • Assess the condition and needs of collections of digital information • Meaningfully render content despite continually changing technology
FDsys & Preservation: collecting • System discerns technical information about how content is represented (e.g., file format) • DROID recognizes file format, links to format registry • Preservation specialists enhance technical metadata as needed, in bulk or for an individual file • System creates structural metadata that describes where files are physically stored and how they related to each other • Abstraction layer from CMS
FDsys Package Metadata (e.g. METS, MODS, PREMIS) and content renditions (e.g. HTML, PDF, XML) for roughly one bound printed document. • One issue of the Federal Register • One issue of the Congressional Record • A single Congressional Bill • A single Congressional Committee Report • One volume of the Code of Federal Regulations • One title of the United States Code • The 9/11 Report
FDsys Package Structure AIP package folder-1 rendition folder-1 content files rendition folder-2 package folder-2 content files aip.xml mods.xml premis.xml
FDsys & Preservation: storing • Technical metadata stored in XML according to the PREMIS schema • Structural metadata stored in XML according to the METS schema
FDsys & Preservation: using • Technical metadata used to assess when preservation intervention is needed • On an individual file or a group level • Structural metadata allows us to reconstruct the archive if content management system is destroyed
FDsys & Preservation: sharing • In addition to access at the search result level, content can also be downloaded at the item level as a Dissemination Information Package • Structural, descriptive, preservation, and authenticity metadata with the content • Technical metadata provided on website for all publically available renditions
For more information • Contact us by email Kate Zwaard, kzwaard@gpo.gov • Visit the FDsys website www.gpo.gov/fdsys