1 / 20

William Underwood, P.I. Georgia Tech Research Institute Atlanta, Georgia

Advanced Decision Support for Archival Processing of Presidential E-Records: Results and Demonstration. William Underwood, P.I. Georgia Tech Research Institute Atlanta, Georgia

decima
Download Presentation

William Underwood, P.I. Georgia Tech Research Institute Atlanta, Georgia

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Advanced Decision Support for Archival Processing of Presidential E-Records: Results and Demonstration William Underwood, P.I. Georgia Tech Research Institute Atlanta, Georgia This research was sponsored by the Army Research Laboratory and NARA under Army Research Office Cooperative Agreement W911NF-06-2-0050 (Sept 22, 2006-Sept 21, 2009).

  2. Document Type Recognition • Metadata Extraction • Item Description • Speech Act Recognition • Decision Support for Archival Review • File Format Identification • Demonstrations Overview

  3. In responding to FOIA requests, Archivists need to be able to search collections of records with high precision and recall. • But at the time of responding to FOIA requests, archivists have not read all of the records, so cannot index the records and search on such attributes as person, organization and location names, topics, dates, author’s and addressee’s names and document types. • Archivists cannot describe a collection until the collection has been manually read and reviewed. • With increasing volumes of electronic records, it may be decades or even centuries before new acquisitions are described. • Item Descriptions are needed in the results of FOIA Search Document Types, Metadata and Archival Description

  4. Document Reader • English Tokenizer • Wordlist Lookup + enhanced wordlists • Sentence Splitter • Hepple POS Tagger + lexicon • Semantic Tagger + Named Entity Rules • Intellectual Element Annotator + Intellectual Element Rules (DER) • SUPPLE Parser/Interpreter + Document Type Grammars augmented with Semantics • Extract Metadata Method for Recognizing Document Types

  5. Documentary Form:Intellectual Element Recognition

  6. Grammar for Documentary Form of a Memorandum

  7. Parse Tree and Semantics of the Document

  8. DOCTYPE = ‘White House Memorandum’ DATE = ‘April 27, 1992’ AUTHOR = ‘EDE HOLIDAY’ ADDRESSEE = ‘SAM SKINNER’ TOPIC = ‘California Earthquake’ DESCRIPTION = ‘Memorandum dated April 27, 1992 from EDE HOLIDAY to SAM SKINNER regarding California Earthquake’ Extracted Metadata and Item Description in Manifest

  9. Actions are a part of item descriptions • Signature Memorandum from Boyden Gray to the President recommending the nomination of Ronald B. Leighton to be a US District Judge. • Letter from President Bush to President Mikhail Gorbachev suggesting an informal meeting. • Memorandum from President Bush to Boyden Gray requesting an analysis of the War Powers Resolution. • Letter from Susan Black to President Bush expressing appreciation for nomination and commitment to serve. Speech Actsand Record Description

  10. Archival review in response to FOIA requests requires recognition of the actions expressed in records Presidential Records Act restriction on disclosure a(5) “Confidential Advice” "confidential communications requesting or submitting advice, between the President and his advisors, or between his advisors” Example of action expressing confidential advice: “I further recommend that the President look for opportunities to speak at an appropriate event indicating his knowledge of and interest in this issue, …” Speech Acts and Archival Review

  11. Every complete sentence carries out a speech act. • Performative sentences express explicit speech acts. • A performative verb is a verb whose action is accomplished merely by saying it or writing it. I recommend that you attend the conference. • Declarative, imperative and interrogative sentences express implicit speech acts. • Declarative (state) • You completed the report • Imperative (request) • Please, complete the report. • Interrogative (ask) • Did you complete the report? Explicit & Implicit Speech Acts

  12. Input: Textual Document & metadata from the Manifest • Read author and addressee metadata from the manifest • Information extraction • Parse Sentences in the document • Speech Act Transducer • Annotate Explicit Speech Acts • Annotate Implicit Speech Acts • Annotate Speech Acts Indicated by Text Structure • Annotate Indirect Speech Acts • Annotation of the Primary Speech Acts Output: [document(e1), author(e1, S), addressee(e1, H), act(e1 F(P))] A Method for Recognizing Speech Acts in E-Records

  13. FOIA (and systematic) review of Presidential records for PRA and FOIA restrictions on disclosure requires page-by page review of the records • Due to the increasing volume of records, in all braches of Government, and especially EOP, decision support is needed to assist archivists in review. Decision Support for Archival Review

  14. Reducing the risk of opening a document or passage of a record whose access should be restricted, • A tutoring tool during training of review archivists. • A tool that novice reviewers could use to check their work. • Provision of additional evidence in case a reviewer's judgment was uncertain, or point out uncertainties, where the reviewer thought the decision was certain. • Support estimation of FOIA review workload in terms of the number of restrictions and types of restrictions likely to apply. • Support reviews of Federal Records for FOIA exemptions. • Extension of the technology to support declassification of security classified records. Potential Benefits of Archival Review Assistant

  15. Components of an Archival Review Assistant

  16. A capability to identify file formats is needed by ERA for • Insuring compliance with Record Transmittal Agreement • Viewing/playing files • Conversion to current or standard file formats • archive extraction • Password recovery and decryption • Repair of damaged files File Format Identification

  17. Linux File Command & Magic File

  18. Magic for individual file formats • Output of file command/magic file is File Format ID • Rewriting file command code for identifying Characteristics of Text files and Document Types • Defined approx. 800 file format signatures • Collected examples of approx. 500 of the file format types • Created File Signature Database • Verified that File Format Identifier with magic file correctly identifies approx. 500 File Types Extensions of File Command and Magic File

  19. Document Type Recognition, Metadata Extraction & Item Description • Automatic Recognition and Interpretation of Performative Sentences • Decision Support for Archival Review • File Format Library & File Format Identifier Demonstrations

  20. W. Underwood et al. Advanced Decision Support for Archival Processing of Presidential E-records, TR ITTL/CSITD 09-01, Georgia Tech Research Institute, Sept 2009 • W. Underwood & S. Laib. Automatic Recognition of Documentary Forms, Technical Report ITTL/CSITD 08-02, GTRI, May 2008 • W. Underwood. Recognizing Speech Acts in Presidential E-records, TR ITTL/CDITD 08-03, GTRI, Oct 2008 Additional Information

More Related