1 / 23

Statistical Validation And Data Analytics In e Discovery

Statistical Validation And Data Analytics In e Discovery. Geoff Black Director, High Tech Investigations Prudential. The views expressed in this presentation are solely those of the presenter and do not necessarily reflect the views of the presenter’s employer. Recommended Reading.

mirit
Download Presentation

Statistical Validation And Data Analytics In e Discovery

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Statistical Validation And Data Analytics In eDiscovery Geoff Black Director, High Tech Investigations Prudential The views expressed in this presentation are solely those of the presenter and do not necessarily reflect the views of the presenter’s employer.

  2. Recommended Reading

  3. Why do we need Statistics?(Ensuring Quality in eDiscovery) Professional standards Savvy judges already require sampling Defensibility

  4. Types of Sampling Judgmental Statistical*

  5. A Recent Experience with Sampling Setting the stage

  6. A Recent Experience with SamplingThe Challenge Select appropriate filters for a large data set Audit reviewers without double reviewing everything Test our processing tools Accomplish all of these with a high confidence level and low confidence interval

  7. Statistics for eDiscoveryConfidence Interval The “confidence interval” or margin of error How closely our results will reflect the general population Lower is better

  8. Statistics for eDiscoveryConfidence Interval Example We have 100 documents and our confidence interval is ± 2%. Testing shows 10% responsivenessGeneral population should show between 8% and 12% responsiveness, or8 to 12 documents.

  9. Statistics for eDiscoveryConfidence Level The “confidence level” Does our sample accurately represent the results of general population? Higher is better

  10. Statistics for eDiscovery Sample Sizes for Population of 1,000,000 Margin of Error

  11. [Scaling] Statistics for eDiscovery Population Size

  12. A Recent Experience with SamplingFiltering Selection Finding a good search method is difficult Who chooses search terms? Requires iterative testing and validation

  13. A Recent Experience with SamplingValidating Filters Began with around 10,000,000 documents A 99% confidence level with a ± 2% confidence interval dictated a sample size of 4,150 documents Chose a random sample and searched Reviewed all the results (positive and negative)

  14. A Recent Experience with SamplingValidating Filters Results did not match expectations Revised the list of search terms Tested the filtering again, and… A more accurate search with less responsive data!

  15. A Recent Experience with SamplingValidating Filters Wait a minute, I always test my keywords! Not whether you test, but how much data…

  16. A Recent Experience with SamplingValidating Review After filtering about 120,000 documents to review Reviewers often disagree about relevance or simply don’t understand the material Double and triple review kills budgets Instead, sample a random set of 4,010 reviewed documents

  17. A Recent Experience with SamplingValidating Review Subject matter expert noted a few anomalies Re-reviewed items with the confusing term One reviewer’s results could not be trusted

  18. A Recent Experience with SamplingKeeping Your Vendors Honest How do they test their tools? How were automated tools used in your matter? Do you know what they cannot do? How did you use the results in your decisions?

  19. What’s Next? Built-in iterative review with statistical sampling Relying solely on “Concept Searching” is a black box and a dead end Advanced search techniques must offer explanatory reasoning

  20. What does all this mean?(The Benefits of Using Statistics) Small dataset for testing Minimize false positives More accurate search, reduced data volume Defensibility of statistically validated testing

  21. One last thing… Technologies will always differ and change rapidly,but statistical validation is a timeless truth.

  22. References & Related Cases • The Sedona Conference Working Group Series, “Commentary on Achieving Quality in the E-Discovery Process,” May 2009. • Losey, Ralph. “The Multi-Modal ‘Where’s Waldo?’ Approach to Search…,” 2010. http://e-discoveryteam.com/2010/02/27/ • William A. Gross Construction Associates, Inc. v. American Manufacturers Mutual Insurance Co., 256 F.R.D. 134, 134 (S.D.N.Y. 2009) • Victor Stanley v. Creative Pipe, 250 F.R.D. 251 (D. Md. 2008) • In re Seroquel Products Liability Litigation, 244 F.R.D. 650, 662 (M.D. Fla. 2007)

  23. Statistical Validation And Data Analytics In eDiscovery Geoff Black geoff@geoffblack.com www.geoffblack.com/ediscovery

More Related