130 likes | 150 Views
Explore key insights on improving legal e-discovery methods, effective evaluation techniques, and the toolbox of strategies discussed at ICAIL '07. Understand the challenges in relevance feedback, shared context, and the importance of comparative evaluations. Discover ways to enhance performance and scalability.
E N D
Lessons Learned from Information Retrieval Chris Buckley Sabir Research chrisb@sabir.com
Legal E-Discovery • Important, growing problem • Current solutions not fully understood by people using them • Imperative to find better solutions that scale • Evaluation required • How do we know we are doing better? • Can we prove a level of performance? Chris Buckley – ICAIL 07
Lack of Shared Context • The basic problem of both search and e-discovery • Searcher does not necessarily know beforehand “vocabulary” or background of either author or intended audience of documents to be searched Chris Buckley – ICAIL 07
Relevance Feedback • Human judges some documents as relevant, system finds others based on judgements • Only general technique to improve system knowledge of context proven successful • works from small collections of 1970’s to large collections of present (TREC HARD track) • Difficult to apply to discovery • Need to change entire discovery process Chris Buckley – ICAIL 07
Toolbox of other techniques • Many other aids to search • Ontologies, linguistic analysis, semantic analysis, data mining, term relationships • Good techniques for IR uniformly: • Give big wins for some searches • Give mild losses for others • Need a set of techniques, a toolbox • In practice for IR research, issue not finding big wins, but avoiding the losses Chris Buckley – ICAIL 07
Implications of toolbox • No expected silver bullet AI solution • Boolean search will not expand to accommodate combinations of solutions • Test collections are critical Chris Buckley – ICAIL 07
Test Collection Importance • Needed to develop tools • Needed to develop decision procedures of when to use tools • Toolbox requirement means needed to distinguish a good overall system from one with a good tool • All systems are able to show searches on which individual tools work well • Good system shows performance gain on entire set of searches. Chris Buckley – ICAIL 07
Test Collection Composition • Large set of realistic documents • Set (at least 30) of topics or information needs • Set of judgements: what documents are responsive (or non-responsive) to each topic • Judgements are expensive and limit how test collection results can be interpreted Chris Buckley – ICAIL 07
Incomplete Judgements • Judgements are too time consuming and expensive to be complete (judge every one) • Pool retrieved documents from a variety of systems • Feasible, but: • Known incomplete • We can’t even accurately estimate how incomplete Chris Buckley – ICAIL 07
Inexact Judgements • Humans differ substantially on judgements • Standard TREC collections: • Topics include 1-3 paragraphs describing what makes a document relevant • Given same pool of documents, 2 humans overlap on 70% of their relevant sets • 76% agreement on small TREC legal test Chris Buckley – ICAIL 07
Implications of Judgements • No gold standard of perfect performance is even possible • Any system claiming better than 70% precision at 70% recall is working on a problem other than general search • Almost impossible to get useful absolute measures of performance Chris Buckley – ICAIL 07
Comparative Evaluation • Comparisons between systems on moderate size collections (several GBytes) are solid. • Comparative results on larger collections (500 GBytes) are showing strains • Believable but larger error margin • Active area of research • Overall goal for e-discovery has to be comparative evaluation Chris Buckley – ICAIL 07
Sabir TREC Legal Results • Submitted 7 runs • Very basic approach (1995 technology) • 3 tools from my toolbox • 3 query variations • One of the top systems • All results basically the same • tools did not help on average Chris Buckley – ICAIL 07