130 likes | 239 Views
E-Discovery Revisited: A Broader Perspective for IR Researchers. Jack G. Conrad, Thomson R&D ICAIL07 / DESI Workshop June 4, 2007. EDD Outline. EDD ― The Big Picture Motivations Background EDD interactions: the “dance” of the litigants The complete EDD pipeline
E N D
E-Discovery Revisited: A Broader Perspective for IR Researchers Jack G. Conrad, Thomson R&D ICAIL07 / DESI Workshop June 4, 2007
EDD Outline • EDD ― The Big Picture • Motivations • Background • EDD interactions: the “dance” of the litigants • The complete EDD pipeline • Alternative view of the enabling technologies
EDD ― The Big Picture • Electronic Data Discovery ― • Context: Practical Research & TREC • Motivations: • (1) Recent characterization of State of the Art in EDD • (2) Informational materials available for participants in forums like TREC
EDD ― The Big Picture • Electronic Data Discovery ― • Presently exist 300-500 companies offering some form of EDD software or services. • Several offer complete services across the E-Discovery spectrum • Kroll On-Track • Recently acquired Engenium (Symetric), the concept search engine co. • LN • Acquired Applied Discovery in recent past, and also offers a full spectrum of EDD services • EDD performance bar constantly being raised • Essential need to share diverse perspectives in field with next generation researchers • What is the “dance of the litigants”? … the complete EDD pipeline? … possible interactions of the enabling technologies?
Source of EDD Survey Responses • The Socha-Gelbmann Report, 2005 • In total, 240 consumers/providers of EDD software / services were contacted • 139 expressed interest in participating • 72 of those were surveyed via spreadsheet or phone interview • 3 of the final spreadsheets did not contain enough info to be used • Conducted among 69 E-Discovery consumers & providers • 24 consumers; 45 providers • Consumers • A cross-section of Am Law 200 law firms + large U.S. companies • Providers • A broad-based collection of software & service providers who market their offerings as E-Discovery tools or services
EDD resources EDD resources EDD resources EDD Scenarios — “the dance of the litigants” Employment Discrimination Party A vs. Company B (David vs. Goliath) Securities Fraud Gov’t vs. Company C Intellectual Property EDD resources Company D vs. Company E EDD resources
Identification (relevant content and its scope) Delivery of reports to clients, systems, in diff. formats & media The EDD Work Flow Model Breadthand depth of discoverable materialsestablished Data transferred from original or intermediate media to uniform media for analysis Vetting performed to reduce volume of data (incl. filtering, deduping, clustering, etc.) Primary review stage. Data transferred to dedicated repository Hard copy media converted (e.g., OCR) or audio records transcribed Electronically stored info. is preserved from multiple sources E-Discovery Pipeline Searching based upon sources, dates, orig. file types, key words, etc. Advice to clients on strategies & procedures for conducting E-Discovery processing • Data Entry & Scanning • Data Gathering Preservation & Collection • Media Restoration (data trans. to a std. media) • Data Processing (filtering, format conversion) • Online Review: Hosting & Searching • Production & Delivery • E-Discovery Consulting (throughout process)
The EDD Work Flow Model Proposed extended scope of text ‘retrieval’ task (i.e., including filtering, organizing & report generation) Identification (scope, depth of information) E-Discovery Pipeline • Data Entry & Scanning • Data Gathering Preservation & Collection • Media Restoration (data trans. to a std. media) • Data Processing (filtering, format conversion) • Online Review: Hosting & Searching • Production & Delivery • E-Discovery Consulting (throughout process)
IV Navigating Searching & I I I I I I E-Discovery Technology Pyramid Reporting Fourth Tier― analyzing: consoli- dating & summarizing; production Third Tier― organizing: classifying or clustering; tagging & linking Indexing Second Tier― vetting: filtering, de- duping, handling similar doc-objects Hosting Foundation― collecting: identification, conversion, migration
Additional E-Discovery Challenges • Workflow Support • Process Efficiencies • Per Step • Overall • Tool Integration • Ease of Use • For Customers • For Support • High Value to Cost Ratio • Added value through advanced technologies • A TREC-like forum has much potential to contribute here • Both within and beyond the context of IR
E-Discovery Revisited: A Broader Perspective for IR Researchers Jack G. Conrad, Thomson R&D ICAIL07 / DESI Workshop June 4, 2007