1 / 23

Enhance legal retrieval applications with an automatically induced knowledge base

This article explores the use of an automatically induced knowledge base to improve legal retrieval applications. It discusses the challenges in legal retrieval, the generation of background concepts, and the combination of concepts and contexts to enhance the retrieval process.

candicej
Download Presentation

Enhance legal retrieval applications with an automatically induced knowledge base

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Enhance legal retrieval applications with an automatically induced knowledge base Ka Kan Lo

  2. Contents • Introduction • Practice in legal retrieval • Generation of Background concepts • Combining concepts and contexts • Conclusion

  3. Introduction • Why needs advanced legal retrieval, e-discovery? • Document Collections • Legal Requirements • Efficiency

  4. Introduction • What challenges? • Explosive growth of document size • Extensive document source • Expanding document format collection • Informal language

  5. Introduction • Opportunities: • Background contexts utilization • Search documents deeply for every possible evidence • Examples – TREC: complaint as background information • More context information: Web and the links

  6. Practice in Retrieval Process • TREC legal track practice: • Defendants devise queries • Plaintiffs’ turns • Final queries for production request • Document Retrieved

  7. Practice in Retrieval Process • What can be added to the process? • Exploit the background information – complaints • Merge with the larger background – Web and links • Proposal in this work – Use Wikipedia as an example

  8. Modeling

  9. Generation of Background concepts • Representation of Background concepts: • Entities & Relations • Ease the conversion from texts to concepts • Facilitate unsupervised operations

  10. Generation of Background concepts • Concepts sources – Wikipedia • Page: a document • Title: central concept described by a document • Links: A set of concepts / terms to other pages • Word: Set of words

  11. Generation of Background concepts • Facilitate lexical realization from texts to concepts: • Surface concepts: Mentioned by a page • Hidden concepts: Indexed by no pages but exist in pages

  12. Generation of Background concepts • Entities: • Basic objects – named entities, locations, organizations …. • Definitions: • e⊂c, e≠r, e∈role of relations

  13. Generation of Background concepts • Relations: • Relationships between concept • r⊂c, • r≠e, • r=<role1, role2, role3>, rolei = e

  14. Semantical Domain • Semantical Domain: • Group of inter-related concepts, as defined by Wikipedians • Groups can be configured, reconfigured, depending on the size, nature of domains • Represent background information of different size, nature, structures

  15. Semantical Domain • Operations: • D = {pagei} where pagei∈ E • Overlap • Subsumed • Join

  16. Knowledge Extraction, Parsing • Parsing: • Conversion of syntactic parse into concepts representations • Dependency parsing • Fill the entities and relations automatically

  17. Entities & Relations • Highlights of the process: • Syntactic parsing of sentences • Conversion from linguistic representation to concepts representation • Constraint the concept spaces by different sizes and scopes

  18. Combining the concepts and background contexts • Algorithms: • Filter the background text and request text • Match the term set into Wikipedia • Build the network of concepts and relations • Combine for single network and filter unnecessary concepts • Extract terms and concepts and expand the query string • Fire the query to retrieval

  19. Conclusion

  20. Conclusion • Challenges in legal retrieval • Background contexts • Generation of background concepts • Project the context to concepts • Expand the queries for retrieval

  21. Conclusion • Current work: • Integration of language learning (not only parsing) and concepts generation process • Large scale construction of networks with full document set in 3 languages on Grid: • English: 1.7 million • Spanish: 300 thousand • Chinese: 200 thousand

  22. Conclusion • Current work: • Experiments running on 20M web pages corpus for expanded links • Generated Language, Concept spaces used in other Natural Language Technologies (NLT) • TREC-Legal: Testing the integration of knowledge base with the complaint text for queries • TREC-Legal: Building new matching mechanism (from KB induction) on small, concise set of documents

  23. Thank you QA

More Related