430 likes | 558 Views
What does Auto-categorization Have To Do With President Obama's Memorandum on Records Management? (Hint: Everything). Auto-Classification: Taking A Closer Look ARMA NOVA Spring 2012 Chapter Seminar Falls Church, VA March 6, 2012 Jason R. Baron Director of Litigation
E N D
What does Auto-categorization Have To Do WithPresident Obama's Memorandum on Records Management? (Hint: Everything) Auto-Classification: Taking A Closer Look ARMA NOVA Spring 2012 Chapter Seminar Falls Church, VA March 6, 2012 Jason R. Baron Director of Litigation Office of General Counsel National Archives and Records Administration
A New Era of Government “[P]roper records management is the backbone of open Government.” President Obama’s Memorandum dated November 28, 2011 re “Managing Government Records” http://www.whitehouse.gov/the-press-office/2011/11/28/presidential-memorandum-managing-government-records
Email is still the 800 lb. gorilla of ediscovery (see 36 CFR 1236.22 (2009))
Presidential Memorandum • From President Obama’s Memorandum, dated 11/28/11: “Decades of technological advances have transformed agency operations, creating challenges and opportunities for agency records management. Greater reliance on electronic communication and systems has radically increased the volume and diversity of information that agencies must manage. With proper planning, technology can make these records less burdensome to manage and easier to use and share. But if records management policies and practices are not updated for a digital age, the surge in information could overwhelm agency systems, leading to higher costs and lost records.”
Agency Commitments to Records Management Reform 2(a) The head of each agency shall: (i) ensure that the successful implementation of records management requirements in law, regulation, and this memorandum is a priority for senior agency management; (ii) ensure that proper resources are allocated to the effective implementation of such requirements; (iii) within 30 days of the date of this memorandum, designate in writing to the Archivist of the United States (Archivist), a senior agency official to supervise the review required by subsection (b) of this section, in coordination with the agency’s Records Officer, Chief Information Officer, and General Counsel.
Agency Commitments to Records Management Reform 2(b) Within 120 days of the date of this memorandum [i.e., March 2012], each agency head shall submit a report to the Archivist and the Director of the Office of Management and Budget (OMB) that: (i) describes the agency’s current plans for improving or maintaining its records management program, particularly with respect to managing electronic records, including email and social media, deploying cloud-based services or storage solutions, and meeting other records challenges; (ii) identifies any provisions in relevant statutes, regulations, or official NARA guidance that currently pose an obstacle to the agency’s adoption of sound, cost-effective records management policies and practices; and (iii) identifies policies or programs that, if included in the Records Management Directive required by section 3 of this memorandum or adopted or implemented by NARA, would assist the agency’s efforts to improve records management.
Focal Points • creating a Government-wide records management framework that is more efficient and cost-effective; • promoting records management policies and practices that enhance the capability of agencies to fulfill their statutory missions; • maintaining accountability through documentation of agency actions; • increasing open government and appropriate public access to Government records; • supporting agency compliance with applicable legal requirements related to the preservation of information relevant to litigation; and • transitioning from paper-based records management to electronic records management where feasible.
Records Management Directive 3(a) Within 120 days of the deadline for reports submitted pursuant to section 2(b) of this memorandum [i.e., by July 2012] the Director of OMB and the Archivist, in coordination with the Associate Attorney General, shall issue a Records Management Directive that directs agency heads to take specific steps to reform and improve records management policies and practices within their agency.
Records Management Directive 3(b) In the course of developing the directive, the Archivist, in coordination with the Director of OMB and the Associate Attorney General, shall review relevant statutes, regulations, and official NARA guidance to identify opportunities for reforms that would facilitate improved Government-wide records management practices, particularly with respect to electronic records. The Archivist, in coordination with the Director of OMB and the Associate Attorney General, shall present to the President the results of this review, no later than the date of the directive's issuance, to facilitate potential updates to the laws, regulations, and policies governing the management of Federal records.
Process Optimization Problem 1: The transactional toll of user-based recordkeeping schemes (“as is” RM)
Impact of Technology on E-Records Management: Snapshot 2012 (“As is”) • A universe of proprietary products exists in the marketplace: document management and records management applications (RMAs) • DoD 5015.2 version 3 compliant products • However, scalability issues exist • Agencies must prepare to confront significant front-end process issues when transitioning to electronic recordkeeping • Records schedule simplification is key N
RM wish list for 2012…. • RM’s “easy button”: the elusive goal of zero extra keystrokes to comply with RM requirements (capture) • A technology app that automatically tags records in compliance with RM policies and practices (categorize) • Supervised learning RM with minimal records officer or end user involvement (learn) • Rule-based and role-based RM • Advanced search
Electronic Archiving As The First Step • What is it? 100% snapshot of (typically) email, plus in some cases other selected ESI applications • How does it differ from an RMA? Goal is of preservation of evidence, not records management per se • NARA Bulletin 2008-05
A Possible Path Forward? • Email archiving in short term, synced to existing proprietary software on email system • Designation of key senior officials as creating permanent records, consistent with existing records schedules • Additional designations of permanent records by agency component • “Smart” filters/categorical rules built in based on content, to the extent feasible to do • Default are records in designated temporary record buckets, disposed of under existing records schedules.
A pyramid approach combines disposition policy with automated tools to bring FRA email under records management, preservation, and access = permanent or top officials = temporary or staff and support slider The position of the “set-point” for email capture depends on policy and resources: setting it higher allows use of tools now available to get 100% of email at lower volumes;* setting it lower means more records will be captured and smarter tools are needed to distinguish and disposition temporary- and non-record. Implementing an email archiving policy is feasible now, since tools are readily available to capture 100% of email traffic at the individual or organizational level, in formats that can be archived.
A pyramid approach combines disposition policy with automated tools to bring FRA email under records management, preservation, and access = permanent or top officials = temporary or staff and support slider The position of the “set-point” for email capture depends on policy and resources: setting it higher allows use of tools now available to get 100% of email at lower volumes;* setting it lower means more records will be captured and smarter tools are needed to distinguish and disposition temporary- and non-record. Implementing an email archiving policy is feasible now, since tools are readily available to capture 100% of email traffic at the individual or organizational level, in formats that can be archived.
How To Avoid A Train Wreck With Email Archiving…. Capture E-mail But Utilize Records Management!
Functional Requirements for Categorization Products in the Federal workplace Ease of use …. Scalability …. Archiving in native formats….. Metadata preservation … Seamless integration with existing software apps …. Versioning …. Compatibility with big bucket records schedules …. Advanced search capabilities …. Ease of training / machine learning using records officers or end users …. Cost
Process Optimization Problem 2: The Coming Age of Dark Archives (and the inability to provide access) Summit 2012
Example of Boolean search string from U.S. v. Philip Morris • (((master settlement agreement OR msa) AND NOT (medical savings account OR metropolitan standard area)) OR s. 1415 OR (ets AND NOT educational testing service) OR (liggett AND NOT sharon a. liggett) OR atco OR lorillard OR (pmi AND NOT presidential management intern) OR pm usa OR rjr OR (b&w AND NOT photo*) OR phillip morris OR batco OR ftc test method OR star scientific OR vector group OR joe camel OR (marlboro AND NOT upper marlboro)) AND NOT (tobacco* OR cigarette* OR smoking OR tar OR nicotine OR smokeless OR synar amendment OR philip morris OR r.j. reynolds OR ("brown and williamson") OR ("brown & williamson") OR bat industries OR liggett group)
U.S. v. Philip Morris E-mail Winnowing Process • 20 million 200,000 100,000 80,000 20,000 • email hits based relevant produced placed on • records on keyword emails to opposing privilege • terms used party logs • (1%) • A PROBLEM: only a handful entered as exhibits at trial • A BIGGER PROGLEM: the 1% figure does not scale
Beyond Keywords: Alternative Search Methods • Greater Use Made of Boolean Strings • Fuzzy Search Models • Probabilistic models (Bayesian) • Statistical methods (clustering) • Machine learning approaches to semantic representation • Categorization tools: taxonomies and ontologies • Social network analysis • Hybrid approaches Reference: Appendix to The Sedona Conference® Best Practices Commentary on the Use of Search and Information Retrieval Methods in E-Discovery (2007), available at http://www.thesedonaconference.org (link to publications)
Bayesian Statistical Models Based on mathematical models of Statistical Probability to recognize documents of similar content. • Learns passively from the document content • Position, frequency and proximity of terms (language independent) combine to create a mathematical “thumbprint” of concepts contained in documents. • Useful to “cluster” documents by content • Can “learn” to build clusters from exemplar sets • Requires re-indexing and assessment can change
Latent Semantic Indexing (LSI) • SVD (Singular Value Decomposition) assigns each record to a place creating “clusters” z “Query” documents are SVD analyzed and placed in the matrix x “Hits” and rankings are determined by the distance from clusters Vector length = relevance ranking y
Improved review and case assessment: cluster docs thru use of software with minimal human intervention at front end to code “seeded” data set Emerging New Strategies:“Predictive Analytics” Slide adapted from Gartner Conference June 23, 2010 Washington, D.C.
Visual Analysis Examples(Presentation by Dr. Victoria Lemieux, Univ. British Columbia, at Society of American Archivist Annual Mtg. 2010, Washington, D.C.) With acknowledgments to Jeffrey Heer, Exploring Enron, http://hci.stanford.edu/jheer/projects/enron/, Adam Perer, Contrasting Portraits, http://hcil.cs.umd.edu/trs/2006-08/2006-08.pdf, and Fernanda Viegas, Email Conversations, http://fernandaviegas.com/email.html
Social Networking/Links Analysis Example From Marc Smith Posted on Flickr Under Creative Commons License
Judicial second guessing of failure to use e-search capabilities: Capitol Records v. MP3 Tunes, 261 F.R.D. 44 (S.D.N.Y. 2009) • “In [a prior case] the Court notes its dismay that the party opposing discovery of its ESI had organized its files in a manner which seemed to serve no purpose other than ‘to discourage audits. . .’ Similarly, in this case, [the party] host[ed] no ediscovery software on their servers and apparently are unable to conduct centralized email searches of groups of users without downloading them to a separate file and relying on the services of an outside vendor.”
Judicial second guessing of failure to use e-search capabilities: Capitol Records v. MP3 Tunes (con’t) Court went on to add: “The day will undoubtedly will come when burden arguments based on a large organization’s lack of internal ediscovery software will be received about as well as the contention that a party should be spared from retrieving paper documents because it had filed them sequentially, but in no apparent groupings, in an effort to avoid the added expense of file folders or indices.”
References Background Law Review Referencing Autocategorization & Advanced Search Advanced Search J. Baron, “Law in the Age of Exabytes: Some Further Thoughts on ‘Information Inflation’ and Current Issues in E-Discovery Search, 17 Richmond J. Law & Technology (2011), see http://law.richmond.edu Latest “Predictive Coding” Case Law to follow in blogs online: • Da Silva Moore v PublicisGroupe & MSL Group, 11 Civ. 1279 (S.D.N.Y.) (Peck, M.J.) (Opinion dated Feb. 24 2012) • Kleen Products, LLC v. Packaging Corp. of America, 10 C 5711 (N.D. Ill.) (Nolan, M.J.) National Archives and Records Administration
Jason R. Baron Director of Litigation Office of General Counsel National Archives and Records Administration (301) 837-1499 Email: jason.baron@nara.gov