1 / 42

How Predictive Coding Can be Used by Records Managers

How Predictive Coding Can be Used by Records Managers. Presented by Rebecca Shwayri. Topics to be Covered. Introduction to Predictive Coding and its benefits How can records managers use Predictive Coding Predictive Coding in Action Limitations of keyword searches & human review

Download Presentation

How Predictive Coding Can be Used by Records Managers

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. How Predictive Coding Can be Used by Records Managers Presented by Rebecca Shwayri

  2. Topics to be Covered • Introduction to Predictive Coding and its benefits • How can records managers use Predictive Coding • Predictive Coding in Action • Limitations of keyword searches & human review • Predictive Coding Defensibility

  3. Intro to Predictive Coding • What is predictive coding? • How does it work?

  4. What is NOT Predictive Coding • NOT Magic • NOT a cure for cancer • NOT based on voodoo

  5. What is NOT Predictive Coding • Keyword searching • Concept searching • E-mail threading • These methods can be useful but do not predict relevance of future documents based on past documents

  6. What is Predictive Coding • Expert (you) develops an understanding of the documents and classifies the documents • Old tech • In common use today • Example: Spam Filter, Amazon.com • Math and Statistics

  7. How Predictive Coding Works • Algorithms • Mathematical model built • Accuracy depends on quality of training set

  8. Predictive Coding in Practice Random Sample Non-Responsive Responsive Single person reviews & codes the Sample Repeat as needed Review 2000-5000 randomly selected documents One person’s time for 15-39 hours Computer learns & predicts Responsive Non-Responsive Computer categorizes all remaining documents

  9. Benefits of Predictive Coding • Dramatic Reduction in e-discovery costs • More accurate than human review and keyword search • Light years faster than human review and keyword search

  10. Benefits of Predictive Coding • Fact driven, not fear driven, settlements • Learn the facts of the case in a few days rather than over months or years using traditional methods of review • Helps avoid litigation – uncovers the facts more quickly • Use as an information governance tool

  11. Predictive Coding Compared

  12. Uses of Predictive Coding • Information Governance Tool (proactive) • Litigation Tool (reactive)

  13. Information Governance • Encompasses a variety of disciplines • Records Management • Knowledge Management • Information Security and Privacy

  14. Lets Keep Everything=Bad • Data breach risks • E-discovery costs • Unable to locate documents needed for the business units

  15. Use PC to apply IG policies • Standardized IG policies • Reduce the need to review every single document to determine the importance of the document to the company • Locate data within the company’s IT infrastructure and categorize it appropriately for the business units • Locate data that needs to be destroyed

  16. Predictive Coding in Litigation • Example: Company is sued in a dispute involving fraud and breach of contract • Custodians: 20 Potential Custodians with average e-mail box of 40 GB each (800 total GB of e-mail data) • Other electronic Files: 200 GB • Total Data: 1 Terabyte

  17. Predictive Coding in Litigation • Company is served with a Request for Production of Documents by Plaintiffs’ Counsel • Plaintiffs’ Counsel demands searching through ESI of custodians • Plaintiffs’ Counsel makes a broad demand for accounting records

  18. Predictive Coding in Litigation • What do you do? • Keyword search 1TB of data? How do you keyword search fraud? Information disadvantage! • Human review? It will take many, many months and millions of dollars to review 1TB of data!

  19. Predictive Coding in Litigation • Use Predictive Coding • Should you disclose? • One school of thought suggests disclosing use of predictive coding to opposing counsel, agreeing to precision and recall rates (Full Agreement and Full Disclosure) • The other school of thought suggests making no disclosures (Avoid litigation associated with use of predictive coding)

  20. What are Precision & Recall? • Recall (Completeness) • Recall measures how successful the system was in finding all of the responsive documents. • If 1,000 documents in the full set were actually responsive, but the system only marked 750 of those documents responsive, then the recall would be 75 percent. • Precision (Accuracy) • Precision measures how often the documents that were marked responsive were actually responsive. • If the system marked 10 documents responsive, and only six of them were actually responsive, then the precision would be 60 percent.

  21. How Much Time & Effort? • Depends on collection “richness” • 2-5 days – one person & one only! • 500-5000 documents reviewed • Stop when system exhibits: • High rates of Precision & Recall – above the agreed to rates • No longer discovering new topics to teach the computer about • Computer is predicting with consistency

  22. Quality Control • It is like Exit Polling…. • Statistics Truth: Sample of a certain size yields a certain level of confidence and a certain margin of error. • 400 randomly selected docs provides 95% confidence level in the estimate of Predictive Coding accuracy, with a ± 5% margin of error. • Reference: Cochran, WG 1977. Sampling Techniques, 3rd Ed. John Wiley & Sons, New York, New York, USA.

  23. When to use CAR • When you are out of time • If you want to save money • Consider using CAR for cases involving 5 GB or more of data • Predictive coding makes sense when you have 20,000 documents or more

  24. Clawback Agreements • Judge Facciola (D.DC): “If you are practicing e-discovery without a clawback, you are committing malpractice.” • Parties agree in writing that inadvertent production of privileged material does not automatically constitute a waiver

  25. Clawback Agreements • What if the other side won’t agree to the clawback agreement? • Go to the Court! • Rajala v. McGuire Woods, 2010 WL 294582 (D. Kan. July 22, 2010): Court issued clawback order with no need to show reasonable efforts

  26. Clawback Agreements • Consider Clawback Agreement during “meet and confer” conference • Embody agreement in Court Order (Rule 502(d))

  27. Privilege Review • Predictive coding should be used to cull down data set to a manageable level • This should occur AFTER predictive coding • Attorneys should conduct privilege review • Attorneys need to decide what is privileged: Do not put this on auto-pilot

  28. Limitations of Linear Review • Why Linear Review is Ineffective • Linear Review compared to other methods

  29. Limitations of Keywords • Catches only 20 percent of relevant evidence • Therefore…misses 80 percent • The “Google” phenomenon

  30. Problems With Keywords • Failure of imagination (Example: Nasdaq versus Stock Market) • How many synonyms for the word “think”? • Precise Terms of Art • Misspellings (Example: Mangment, Mangemnt…)

  31. Problems With Keywords • Human problem • People express concepts differently • Difficulties in learning to adopt another party’s language style • TREC (Text Retrieval Conference) was a competition and it showed a complete failure in keyword searches

  32. Financial Impact of Keywords • Human keyword based review is expensive • It is slow & inaccurate • It unnecessarily complicates a simple process • Is widely used as until now, there were no alternatives • Predictive coding – when “done right” – can save a corporation 80-90% of review costs.

  33. TREC Legal Track Study 2009 • Keyword searches missed 96 percent of relevant documents (recall ratio averaged less than 4 percent)

  34. TREC Legal Track Study 2010 • 97 percent of relevant documents not found • Only a 3 percent recall ratio (76,373 relevant documents not discovered) • Boolean searches reduced the initial corpus from 685,592 to 2,715 documents • 87 percent precision ratio (2,362 documents out of 2,715 are relevant)

  35. Blair and Maron Study • Involved a San Francisco Bay Area Rapid Transit Accident • Discovery database contained 40,000 documents and 350,000 pages • Attorneys believed keyword searches uncovered 75 percent of relevant documents • In reality: Only 20 percent of relevant documents uncovered

  36. The “Gold” Standard • Human eyeballs on every document • Judge Peck: The “gold” standard does not have any gold • Human assessors disagree on the relevance of a document to a single topic

  37. The “Gold” Standard • TREC Conclusion: 65% Recall and 65% Precision is best retrieval effectiveness for human reviewers • Human eyeballs on every document is not working • Reviewers disagree as frequently as 50 percent

  38. Predictive Coding is Defensible • Monique Da Silva Moore v. Publicis Groupe & MSL Group (SDNY) (endorsed using predictive coding) • Complicated and confusing protocol – DO NOT USE • Defendants offered plaintiffs everything they wanted – protocol was so confusing they could not see they got everything they ask for – so they went after the Judge. • Global Aerospace, Inc. v. Landow Aviation Limited Partnership (Circuit Court of Loudoun County Virginia) (authorized use of predictive coding over objection) • Nothing in news – as no controversy – everything worked!

  39. Is Keyword Search Defensible? • Expensive • Kleen case – 1400 attorney hours to determine search terms – and plaintiff was not satisfied – and neither was aware of overall effectiveness of terms • Not effective • Over or Under produces • Known to be very problematic • “Ostrich approach” is no longer advisable – technology has evolved • Judges know it exists, plaintiffs know it exists and ask for it

  40. Ordered use of PC • EORHB, Inc., et. al. v. HOA Holdings, LLC (Delaware Chancery Court) • Court ordered the parties sua sponte to use predictive coding and ordered the parties to use the same vendor • Judge may have over stepped bounds

  41. In conclusion • Technology is your friend • Make data driven decisions • We are living in the “MoneyBall” age • If you are unsure, please ask – this is not going away

  42. Contact For more information contact Rebecca Shwayri Email: rebecca.shwayri@akerman.com Tel: (813) 209-5029

More Related