Machine Learning Classification for Document Review

Machine Learning Classification for Document Review Tom Barnett, Svetlana Godjevac, Caroline Privault, Jean-Michel Renders, John Schneider, Robert Wickstrom

Time pressure Information growth

Basic assumptions Key Words Search addresses recall Attorney Review addresses precision

Tacit Assumption • Attorney review is superior to any other form of review

Problems With Standard Process Scalability Speed Cost Quality Lower recall Lower consistency

CategoriX Combines two facets of document analysis: • Clustering (document identification based on similarity of text content) • Manual categorization

Clustering Technology red green blue

Attorney Categorization non responsive responsive

CategoriX Process Test

CategoriX – Training Training set non responsive green Model red responsive blue CategoriX

Responsive Non-Responsive CategoriX – Model application score 0.273 0.987 Test set 0.515 0.641 0.358 0.735 CategoriX Model 0.886 0.106 0.793 0.074 0.672 0.439

Experiment 1 Data: Population: 5000 emails 5 Review Groups: A1, A2, A3, A4, A5 Training Sets: 1000 emails Test Sets: (Population minus the training set) Goals: CategoriX retrieval for different review groups Comparison between manual and automated classification

Attorney Review Responsiveness Rates A2 marked 42% more documents responsive than A1

CategoriX Retrieval 0.76 0.83 0.83 0.80 0.84

Attorney Review vsCategoriX Attorney to Attorney CategoriX to Attorney A5 = gold standard

Results Summary • Attorney responsiveness rate varied greatly • CategoriX models achieved high recall and precision • CategoriX was more consistent than attorneys

Gains

Conclusion Our testing indicated that the combination of clustering with attorney coding (CategoriX) was at least as accurate and more consistent than attorney review

Machine Learning Classification for Document Review

Machine Learning Classification for Document Review

Presentation Transcript

Machine Learning and Review

Meta Learning: For Classification

CS 391L: Machine Learning: Inductive Classification

Document Classification Comparison

Document Review

Supervised Machine Learning: A Review of Classification Techniques Kotsiantis S.B.

Machine Learning Applied in Product Classification

Document Classification

Machine Learning for Protein Classification: Kernel Methods

Machine Learning (ML) Classification

Classification by Machine Learning Approaches - Exercise Solution

Document classification

Extreme Learning Machine for land cover classification Mahesh Pal

Classification Review

Experimental Study on Sentiment Classification of Chinese Review using Machine Learning Techniques

INF 5860 Machine learning for image classification

CS 782 – Machine Learning Lecture 4 Linear Models for Classification

Naive Bayes for Document Classification

Automatic Text Classification through Machine Learning

Machine Learning: Regression or Classification?

Classification vs Regression in Machine Learning