1 / 14

Role of Enterprise Search in E-Discovery

Role of Enterprise Search in E-Discovery . June 18, 2008. Enterprise E-Discovery is a business process Search is central to E-Discovery. Identification Search. Collection Search. Analysis/Review Search. Custodians Meta-Data Date Range Media Type Data Type. By Custodian By Operator

abla
Download Presentation

Role of Enterprise Search in E-Discovery

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Role of Enterprise Searchin E-Discovery June 18, 2008

  2. Enterprise E-Discovery is a business processSearch is central to E-Discovery • Identification Search • Collection Search • Analysis/Review Search • Custodians • Meta-Data • Date Range • Media Type • Data Type • By Custodian • By Operator • By Data Type • By keyword, phrase, concept • By Project • Responsiveness • Privilege Determination • Review Grouping • Near-duplicates • Quality Control Electronic Discovery Reference Model (www.edrm.net) Processing Preservation Information Management Identification Review Production Presentation Collection Analysis VOLUME RELEVANCE

  3. FRCP Rules governing E-Discovery

  4. FRCP Rules and Their Impact on E-Discovery • Emphasis on co-operation during E-Discovery • Sedona Principles as a Guide for E-Discovery • Early Discovery Planning Conferences • No “Gaming” of E-Discovery • Prepare for Meet and Confer • Organizational Structure • Information Assets and Data Map • ILM Policies and Procedures • Backup and Disaster Recovery Practices • Preservation Hold/Legal Hold Policies and Actions • Establish E-Discovery Scope • Estimate Review Size from automated Search Results • Raw Volume, Processed Volume, Review Volume • Substantiate “Not Reasonably Accessible” Claims • Move burden of “cost provability” to the Requesting Party

  5. Enabling E-Discovery within an Enterprise Meta-Data Index Keyword Index Analysis, Culling, Review Organizational Data IT Personnel Legal IT Personnel Legal Search/ Analysts Digital Asset Database ECM/ILM Policies • File Shares • Messaging • servers Data Map Case Data PreservationHold • CMS Enterprise Intranet

  6. E-Discovery Search Characteristics • Theme • Relevance • Results Management • Produce Entire Results – not sufficient to only produce Top N • No Estimates of Counts – Must provide accurate, actual counts • Stability of Results • Very large Result Sets • Fast Query Response Time • Provide Complete Hit Context • Activity Based Relevance – Responsiveness Search vs. Privilege Search • Meta-Data based Relevance – Timeliness, People, Connection to other data • Review-directed Relevance • Traditional TF/IDF based Relevance • Complete Auditing of all Searches • Document Hit Count Reports • Tie back to original Document Meta-Data • EDRM XML-2 Export to downstream processes • Group Neat-Duplicates, Concept Clusters for Review Efficiency • Data Types • Flexibility • Workflow • Many data formats – 10,000 formats • New communication formats – Wiki, Blogs, SMS, IM, Unified Messaging • ESI from old, legacy applications • Incomplete and Corrupt data (Deleted Files, raw disk blocks) • Handle Multi-language ESI • Handle Low-fidelity documents – OCR-scanned images • Advanced Search/Query Language • Iterative Search and Search Refinement • Guided Navigation, one-click Filtering • Saving and Sharing Searches • Remove impediments to search – ACLs, Encryption, Container Extraction • Real-time updates for Tagging, Classifying Results • Incremental ESI Collections (Batches) • Multi-level Review • Multi-person Review • Rolling Productions • Activity Reports • Outside Counsel, Opposing Counsel interactions • Project Management

  7. Search EffectivenessTechniques to improve Precision and Recall • Precision • Pre-filtering wildcard expansions • Boolean Queries • Proximity Specification • Keyword Scope (Sentence, Paragraph) • Meta-Data Context • Entity based Search • Recall • Misspellings/Fuzzy Search • Wildcard Specifications • Synonyms • Related Terms • Concept Search • Bayesian Search Precision Recall

  8. E-Discovery Search: Typical measures and outcomes Number of truly responsive in Retrieved Collection: Number of truly responsive documents in Un-Retrieved Collection

  9. Interactive Search: Key to Search Efficiency • Interactive wildcard, stemming expansion selection • Removes precision-recall tradeoff by enabling interactive review and removal of false positive expansions • Save thousands of dollars per search • Search Report • Detailed, interactive keyword search report results for iterative large query execution • Full transparency and auditing • Significant time savings

  10. E-Discovery is about extracting Relevant Content 100-1000 TB 50 TB Preservation Store 500 GB 1-2 GB Archive and Store Collect and Preserve Analyze and Review

  11. Enterprise Case Study – Global Media Conglomerate Case Data Eliminating the need to process and review 456,000 documents saved $175,000 456,448 208,628 Data culling based on query permutations reduced data set by 99% to417 74,713 Time = 2.5 days

  12. E-Discovery - Workflow Meta-Data (Shallow Index) • Rate of Ingestion • 1M files/hour • 10K directory scans • 1 TB/hour • Rate of Indexing • 100 K files/hour • 10-20 GB/hour • Rate of Extraction • 20 K files/hour • 2-4 GB/hour • Rate of Processing • 100 custodians • 10K files/hour • 1 GB PST/custodian Source SCAN Full Text Indexer Copy Engine Processing Case Mgmt SOURCES Processing Manifest Deep Index Full-Text Case ESI Store SQL Full Text • Size of Index • 0.2 TB • 1 billion rows • 10K/s Bulk-Load • Size of Index • 1 TB • 10 billion objects • Size of Index • 1 TB (each partition) • Up to 100 index partitions • 10 billion objects • 200-400 file types • Includes meta-data • Size of Store • 32 TB FC/SCSI • 4 TB NTFS • 300 GB/custodian • 100 custodians • Size of Manifest • 10 million items

  13. E-Discovery Search: Collection Workflow Meta-Data (Shallow Index) Case Document Collection SourceSCAN SOURCES • Search Scope • Owners/SID • Last Modification Date • Creation Date • Author/Title • Department • Search Technology • Keyword Search • Parameterized Date Range • Copy of Original • Maintain Original Locations • Hash with Meta-Data for content and location integrity • Hash without Meta-Data for content Integrity

  14. E-Discovery Search: Analysis Workflow Privileged Documents Potentially Privileged Documents Responsive Documents Privilege Review Potentially Responsive Documents Production Documents Non-Responsive Documents • Privilege Search • Documents • Emails Case Document Collection Privilege “Misses” Review • Search Scope • Documents • Emails • Search Technology • Keywords • Boolean Search • Proximity Search • Fuzzy Search • Concept Search • Tagged Search Sampling Engine • Quality Control • Documents • Emails • Tags Sample Non-Responsive Documents • Search Refinement • Additional Keywords • Additional Search Methods • Reports • Search Reports • Activity Reports • QC Reports • Project Review Reports • Privilege Log • Exceptions Reports Responsive Misses “Recall” Document Review

More Related