Automating Document Review

Automating Document Review Nathaniel Love CS 244n Final Project Presentation 6/14/2006

Document Review • Litigation cases, government investigations • Discoveryprocess: Company involved in case is compelled to produce documents (internal memos, financial statements, email) in response to a discovery request. • Company doesn’t want to release everything, only those documents that are • Responsive to the discovery request, and • Not privileged, meaning subject to protection under attorney-client privilege. • Company’s attorney must review all documents before they are produced. • In a large litigation case, this may be ~500,000 documents. Nathaniel Love

Classification Problem • 500,000 emails to review • Inspection by attorneys at ~100/hr, $275/hr • $1.375 million to pay for document review for 1 case • Improving this process • Each email must be classified as • Responsive / non-responsive • Privileged / non-privileged • As attorneys review, train 2 MaxEnt classifiers • Organize documents classified by partially trained classifiers. • Present sorted documents to attorneys, with suggested classifications. • Run trained classifier on all previously reviewed documents to check errors. Nathaniel Love

Feature Selection / Data • Emails: sender, recipient, date, words/word pairs in subject, presence/type of attachments… • Hand-built features: added based on concepts relevant to discovery request • Enron Corpus: solid match for data seen in actual document review process. • Test and training data drawn from hand-tagged Enron emails (work done by Berkeley group). • Mapped Berkeley categories into responsive/privileged categories based on FERC investigation into Enron (concerning manipulation of energy markets in western U.S.) • Issues • Small data set overall (1700 documents tagged out of over 600,000 in corpus) • Poor data for privilege classifier: tagged documents contain many fewer privileged emails than exist in the corpus overall Nathaniel Love

Results • Accuracy: • 75% (responsive) • 93% (privileged) • Accuracy improvedwith more training. • Positive feedback from attorneys on use of system, especially on the organization and presentation of documents by classifier as it trains. • Weights on features (responsive classifier) • david.parquet@enron.com (high positive weight) • nicholas.oday@enron.com (high negative weight) • David Parquet was Enron’s Vice President for project development in the western U.S. • Nicholas O’Day was Vice President at Enron Japan. Nathaniel Love

Automating Document Review

Automating Document Review

Presentation Transcript

Automating Fermentation

Automating Document Review

Document Review

Automating Performance …

Appraisal Document Review

document review and finalization

Automating HR

QUARTERLY people Review Guidance Document

Estimating Document Peer Review Effort

Automating Processes

Automating Forensics

ESCWA Literature Review Background Document

Document Data Mining Design Review

Automating Visualization

Review of Offtake Arrangements Document

Automating HR

Automating document processing for increasing efficiency-offline article

Document review services

Automating VDI

Draft National Liquor Policy Review Document

Automating Forensics

Document Review Services