140 likes | 294 Views
Role of Enterprise Search in E-Discovery . June 18, 2008. Enterprise E-Discovery is a business process Search is central to E-Discovery. Identification Search. Collection Search. Analysis/Review Search. Custodians Meta-Data Date Range Media Type Data Type. By Custodian By Operator
E N D
Role of Enterprise Searchin E-Discovery June 18, 2008
Enterprise E-Discovery is a business processSearch is central to E-Discovery • Identification Search • Collection Search • Analysis/Review Search • Custodians • Meta-Data • Date Range • Media Type • Data Type • By Custodian • By Operator • By Data Type • By keyword, phrase, concept • By Project • Responsiveness • Privilege Determination • Review Grouping • Near-duplicates • Quality Control Electronic Discovery Reference Model (www.edrm.net) Processing Preservation Information Management Identification Review Production Presentation Collection Analysis VOLUME RELEVANCE
FRCP Rules and Their Impact on E-Discovery • Emphasis on co-operation during E-Discovery • Sedona Principles as a Guide for E-Discovery • Early Discovery Planning Conferences • No “Gaming” of E-Discovery • Prepare for Meet and Confer • Organizational Structure • Information Assets and Data Map • ILM Policies and Procedures • Backup and Disaster Recovery Practices • Preservation Hold/Legal Hold Policies and Actions • Establish E-Discovery Scope • Estimate Review Size from automated Search Results • Raw Volume, Processed Volume, Review Volume • Substantiate “Not Reasonably Accessible” Claims • Move burden of “cost provability” to the Requesting Party
Enabling E-Discovery within an Enterprise Meta-Data Index Keyword Index Analysis, Culling, Review Organizational Data IT Personnel Legal IT Personnel Legal Search/ Analysts Digital Asset Database ECM/ILM Policies • File Shares • Messaging • servers Data Map Case Data PreservationHold • CMS Enterprise Intranet
E-Discovery Search Characteristics • Theme • Relevance • Results Management • Produce Entire Results – not sufficient to only produce Top N • No Estimates of Counts – Must provide accurate, actual counts • Stability of Results • Very large Result Sets • Fast Query Response Time • Provide Complete Hit Context • Activity Based Relevance – Responsiveness Search vs. Privilege Search • Meta-Data based Relevance – Timeliness, People, Connection to other data • Review-directed Relevance • Traditional TF/IDF based Relevance • Complete Auditing of all Searches • Document Hit Count Reports • Tie back to original Document Meta-Data • EDRM XML-2 Export to downstream processes • Group Neat-Duplicates, Concept Clusters for Review Efficiency • Data Types • Flexibility • Workflow • Many data formats – 10,000 formats • New communication formats – Wiki, Blogs, SMS, IM, Unified Messaging • ESI from old, legacy applications • Incomplete and Corrupt data (Deleted Files, raw disk blocks) • Handle Multi-language ESI • Handle Low-fidelity documents – OCR-scanned images • Advanced Search/Query Language • Iterative Search and Search Refinement • Guided Navigation, one-click Filtering • Saving and Sharing Searches • Remove impediments to search – ACLs, Encryption, Container Extraction • Real-time updates for Tagging, Classifying Results • Incremental ESI Collections (Batches) • Multi-level Review • Multi-person Review • Rolling Productions • Activity Reports • Outside Counsel, Opposing Counsel interactions • Project Management
Search EffectivenessTechniques to improve Precision and Recall • Precision • Pre-filtering wildcard expansions • Boolean Queries • Proximity Specification • Keyword Scope (Sentence, Paragraph) • Meta-Data Context • Entity based Search • Recall • Misspellings/Fuzzy Search • Wildcard Specifications • Synonyms • Related Terms • Concept Search • Bayesian Search Precision Recall
E-Discovery Search: Typical measures and outcomes Number of truly responsive in Retrieved Collection: Number of truly responsive documents in Un-Retrieved Collection
Interactive Search: Key to Search Efficiency • Interactive wildcard, stemming expansion selection • Removes precision-recall tradeoff by enabling interactive review and removal of false positive expansions • Save thousands of dollars per search • Search Report • Detailed, interactive keyword search report results for iterative large query execution • Full transparency and auditing • Significant time savings
E-Discovery is about extracting Relevant Content 100-1000 TB 50 TB Preservation Store 500 GB 1-2 GB Archive and Store Collect and Preserve Analyze and Review
Enterprise Case Study – Global Media Conglomerate Case Data Eliminating the need to process and review 456,000 documents saved $175,000 456,448 208,628 Data culling based on query permutations reduced data set by 99% to417 74,713 Time = 2.5 days
E-Discovery - Workflow Meta-Data (Shallow Index) • Rate of Ingestion • 1M files/hour • 10K directory scans • 1 TB/hour • Rate of Indexing • 100 K files/hour • 10-20 GB/hour • Rate of Extraction • 20 K files/hour • 2-4 GB/hour • Rate of Processing • 100 custodians • 10K files/hour • 1 GB PST/custodian Source SCAN Full Text Indexer Copy Engine Processing Case Mgmt SOURCES Processing Manifest Deep Index Full-Text Case ESI Store SQL Full Text • Size of Index • 0.2 TB • 1 billion rows • 10K/s Bulk-Load • Size of Index • 1 TB • 10 billion objects • Size of Index • 1 TB (each partition) • Up to 100 index partitions • 10 billion objects • 200-400 file types • Includes meta-data • Size of Store • 32 TB FC/SCSI • 4 TB NTFS • 300 GB/custodian • 100 custodians • Size of Manifest • 10 million items
E-Discovery Search: Collection Workflow Meta-Data (Shallow Index) Case Document Collection SourceSCAN SOURCES • Search Scope • Owners/SID • Last Modification Date • Creation Date • Author/Title • Department • Search Technology • Keyword Search • Parameterized Date Range • Copy of Original • Maintain Original Locations • Hash with Meta-Data for content and location integrity • Hash without Meta-Data for content Integrity
E-Discovery Search: Analysis Workflow Privileged Documents Potentially Privileged Documents Responsive Documents Privilege Review Potentially Responsive Documents Production Documents Non-Responsive Documents • Privilege Search • Documents • Emails Case Document Collection Privilege “Misses” Review • Search Scope • Documents • Emails • Search Technology • Keywords • Boolean Search • Proximity Search • Fuzzy Search • Concept Search • Tagged Search Sampling Engine • Quality Control • Documents • Emails • Tags Sample Non-Responsive Documents • Search Refinement • Additional Keywords • Additional Search Methods • Reports • Search Reports • Activity Reports • QC Reports • Project Review Reports • Privilege Log • Exceptions Reports Responsive Misses “Recall” Document Review