180 likes | 197 Views
The FDLP Web Archive. Dory Bower Archive-It Partner Meeting November 18, 2014. FDLP History and Dissemination of Government Publications.
E N D
The FDLP Web Archive Dory Bower Archive-It Partner Meeting November 18, 2014
FDLP History and Dissemination of Government Publications • Act of 1813: Congress first authorized legislation to ensure the provision of one copy of the House and Senate Journals and other Congressional documents to certain universities, historical societies, state libraries, etc. • The Printing Act of 1895: Formed the basis for Title 44, created centralization of printing, binding and distribution of US Government documents, establishing the role of the FDLP, and transfer of the Office of the Superintendent of Documents to the GPO. The first Monthly Catalog of US Government Publications printed at the GPO this year. • Title 44, US Code: Mandate for Public Printing and Documents. Chapter 19 deals with the Depository Library Program. Title 44 has seen many changes over the last century.
FDLP History and Dissemination of Government Publications • GPO Electronic Information Access Enhancement Act of 1993: Establishes a means of enhancing electronic public access to a wide range of Federal electronic information. • 1996: Launch of Catalog of US Government Publications (CGP), the online counterpart for the Monthly Catalog of US Government Publications. Publications dating from July 1976 – Present. • 1998: LSCM begins use of PURLS for persistent access to electronic copies of government publications • 2011: Begin use of Archive-It for automated harvest of government websites
Web Archiving Options Decision process for FDLP Web archiving Standard PURL: Individual publications and less complex web sites, using Teleport software Archive-It: Content rich websites Partnership: Hard to harvest sites, database sites or real time information
Collection Development Develop and build website level collection • Must be within scope of FDLP • Not distributing through print • Government information disseminated through web and not cataloged • Avoid duplication of effort with other institutions or already in FDsys • Work with the collection development staff with their many years of experience to help determine needs
Collection Development • Pilot sites: 3 sites to begin testing workflow • SuDoc Y3 sites: commissions, committees, independent agencies • Special Collections • Native American Resources • Nominated sites
Collection Development Nominations • Document Discovery http://usgpo.wufoo.com/forms/document-discovery/ • AskGPO http://www.gpo.gov/askgpo/ • Team email fdlpwebarchiving@gpo.gov
Collection Development The Decision making process • Sent out to team on email, or discuss in weekly meeting • Much discussion within the FDLP web archiving team which represents many areas of LSCM • Is it within scope of FDLP and other collection development parameters • Decide by which means to archive
Collection Development Moving forward • Y3s almost complete • Working with Collection Development staff with their extensive experience to determine needs • Move from smaller to larger sites • Non-standard sites (fatherhood.gov, read.gov) • Special Collections • Regular frequency of crawls • Working with other Federal collecting Institutions
Archive-It Workflow • Notification to Agency • Webmaster – 48 hours intent to crawl • Full disclosure of what we are doing • Chosen for inclusion into FDLP • Will ignore the robots.txt [however only do so when necessary] • Begin seed list, test crawls, QA, modifications • Concentrate a lot of time on test crawl
Archive-It Workflow • Run and QA production crawl • Run patch crawls • Submit lots of questions • Best playback possible • Maximize user experience and account • Make live on Archive-It and submit for metadata
FDLP Web Archive Collection size: • 3.5 TB, over 24 million documents crawled • 56 agencies represented on AIT • 65 records on CGP (analytical cataloging) • FDLP Project page http://www.fdlp.gov/377-projects-active/2020-web-archiving Resources: • 10 contributors
Access Two locations for Access • Archive-It • Search for “GPO” or “FDLP” • Catalog of Government Publications (CGP) • Identifiable through “INTERNET” in SuDoc number • Expert search of wcat=web archiving retrieves all • Would like to find better access to whole collection and eliminate this search
FDLP Web Archive https://archive-it.org/home/FDLPwebarchive
Questions? fdlpwebarchiving@gpo.gov