1 / 6

NLP Technology Applied to e-discovery

NLP Technology Applied to e-discovery. Bill Underwood Principal Research Scientist william.underwood@gtri.gatech.edu “ The Current Status and Future of Search and Retrieval Technology ” WG1 Mid-Year Meeting Cambridge, Maryland April 21-22, 2002. Research Sponsored by ERA Program of NARA.

foy
Download Presentation

NLP Technology Applied to e-discovery

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. NLP Technology Applied to e-discovery Bill Underwood Principal Research Scientist william.underwood@gtri.gatech.edu “The Current Status and Future of Search and Retrieval Technology” WG1 Mid-Year Meeting Cambridge, Maryland April 21-22, 2002

  2. Research Sponsored by ERA Program of NARA • Application of Natural Language Processing Technology to effectively: • Summarize Series of Presidential e-records • Identify FOIA exemptions and PRA restrictions in Presidential e-records • Search for e-records relevant to a FOIA request • Search for e-records in massive collections in support legal discovery

  3. NLP Methods in Document Retrieval • Morphological processing • Identifying words • Parsing-Linguistic representation • Word sense disambiguation • Represent, identify and exploit semantic relationships • Conceptual indexing • Matching concepts in query to conceptual index

  4. Current Weaknesses of NLP in Information Retrieval • NLP methods of document retrieval have failed to perform better that Boolean and statistical methods. Why? • Broad nature of retrieval tasks • Lack of weighting scheme for compound terms • Poor word sense ambiguation for documents and queries. • Need to handle verbs as well as nouns and noun phrases. • Poor POS tagging • Need better parsing algorithms and grammars. • Inadequate handling of negation

  5. Advanced NLP Methods Applied to PERPOS Research Tasks • Morphological analysis • Word sense disambiguation • Larger lexicon • Domain-dependent Lexicons. • Information extraction to identify classes of words • Template filling to identify communication acts of records (nominate, request information, provide information) • Learning and identification of document types • Method of reasoning with negation in NL • Conceptual taxonomy • Rule-based reasoning • Question answering technology

  6. Plausible, Hybrid Approach toInvestigating e-discovery • Formulate e-discovery task not just in search terms but also complaint itself including parties and laws involved. Express the kinds of evidence that would enable one to prove the case as a series of questions or if-then rules drawn from precedent cases. And experience. • Use a COTS text retrieval system with Boolean queries and statistical method to retrieve documents using key terms related to the case. • Use contextual knowledge with questions and NLP methods, (e.g., question answering) to review the retrieved documents to determine more precisely those relevant to the case, i.e., those that would represent evidence.

More Related