1 / 46

Iuriservice II Ontology Development

This workshop discusses the development of an intelligent system using Semantic Web technologies to assist new judges with their typical problems in the legal domain. The system is designed to improve knowledge discovery and provide intelligent knowledge access to legal databases.

wcoleman
Download Presentation

Iuriservice II Ontology Development

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Iuriservice II Ontology Development Núria Casellas, Denny Vrandečić, Joan Josep Vallbé, Aleks Jakulin, Mercedes Blázquez Workshop on Artificial Intelligence and Law XXII World Congress of Philosophy of Law and Social Philosophy Granada, May 2005

  2. Agenda • Introduction to SEKT Project and Legal Case Study • Methodology • OPJK • Improving knowledge discovery on the competency questions • Architecture

  3. The inSEKTs Vrije Universiteit Amsterdam Empolis University of Sheffield Universität Karlsruhe BT Ontoprise Kea-pro Universität Innsbruck iSOCO Sirma AI Universitat Autònoma de Barcelona Jozef Stefan Institute

  4. SEKT • Main goals of SEKT • European Leadership in Semantic Technologies • Core Research • Combine Human Language Technologies, Knowledge Discovery and Ontology Technologies • Provide intelligent knowledge access

  5. Description of the Problem: Legal Domain • In General: • Complaint about diligence of legal administration. • The Judges are overworked. • In Particular: • New Judges • A lot of theoretical knowledge, but few practical knowledge • On Duty. • When they are confronted with situations in which they are not sure what to do • “Disturb” experienced judges with typical questions. • Usually his/her former tutor (Preparador) • Existing Technology • Legal Databases • Essential in their daily work • Based on keywords and boolean operators • A search retrieves a huge number of hits

  6. Description of the Problem: Legal Domain • Solution: • Design an intelligent system to help new judges with their typical problems. • Extended FAQ system using Semantic Web technologies • Connect the FAQ system with the exiting jurisprudence. • Search Jurisprudence using Semantic Web technologies.

  7. State of the Art in Legal Ontologies • LLD [Language for Legal Discourse, L.T. McCarty, 1989]: Atomic formula, Rules and Modalities. • NOR [Norma, R.K. Stamper, 1991, 1996]: Agents Behavioral invariants, Realizations. • LFU [Functional Ontology for Law, R.W. van Kranlinger; P.R.S. Visser, 1995]: Normative Knowledge, World knowledge, Responsibility knowledge, Reactive knowledge and Creative knowledge. • FBO [Frame-Based Ontology of Law, A. Valente, 1995]: Norms, Acts and Concepts Descriptions]. • LRI-Core Legal Ontology [J. Breuker et al., 2002]: Objects, Processes, Physical entities, Mental entities, Agents, Communicative Acts. • IKF-IF-LEX Ontology for Norm Comparaison [A. Gangemi et al., 2001]: Agents, Institutive Norms, Instrumental provisions; Regulative norms; Open-textured legal notions, Norm dynamics.

  8. Conceptual distinctions • Professional Knowledge (PK) • Legal Knowledge (LK)  Legal Core Ontologies (LCO) [based on General Theories of Law] • Legal Professional Knowledge (LPK)  OPLK • Judicial Professional Knowledge (JPK)  OPJK

  9. Ethnographic survey 10 6 1 16 7 8 14 8 5 10 16 12 8 29 Total Autonomous Communities: 14 (out of 17)

  10. Preliminary exploitation of data • Statistical analysis of results • Judicial units: heterogeneity • Judge’s profile • Protocols of analysis • Literal transcripts • Completed questionnaires • List of extracted questions

  11. OPJK Modeling • Identification of possible concepts through ALCESTE’s results and TextToOnto conceptual distribution • Domain detection • Competency questions discussion and concept extraction

  12. Intuitive ontological subdomains CRIMINAL LAW GENDER VIOLENCE ON-DUTY FAMILY ISSUES ORDER OF PROTECTION / INJUNCTION JUDGE CONTRACT LAW IMMIGRATION COMMERCIAL LAW REAL ESTATE JUDICIAL CLERKS PROCEEDINGS DECISION-MAKING & JUDGMENTS

  13. Term extraction using TextToOnto

  14. Term extraction using TextToOnto and Spanish Gate

  15. Identify important concepts that should be represented • Hierarchy construction • Identify relations between them • Redefine the ontology repeting steps 1-4

  16. Competency question discussion Selecting (underlying) all the nouns (usually concepts) and adjectives (usually properties) contained in the competency questions. • ¿Cuál es el tratamiento de las denuncias manifiestamente inverosímiles o relativas a hechos que evidentemente carecen de tipicidad? • ¿Y si se trata de una querella que reúne todos los demás presupuestos procesales pero los hechos objeto de la misma carecen de relevancia penal o manifiestamente falsos? • ¿Qué ocurre si comparece en el juzgado una persona que quiere denunciarhechos difícilmente creíbles, sin relación entre sí, dudándose por el juez de la capacidad mental del denunciante? • ¿Ante quién debe interponerse el recurso de reforma contra la prisión, delante del juez de guardia o del juez que dictó el correspondiente auto de prisión?

  17. OPJK classes identified

  18. OPJK and Proton Integration

  19. Improving knowledge discovery on the competency questions

  20. Data and Method Data: 3 text corpora (judges’ questions): • Corpus 1: Scholar “on duty” questions (Spanish Judicial School = 99) • Corpus 2: Practical “on duty” questions (= 163) (field work) • Corpus 3: All practical questions (=756)(field work) Method: • TEXT GARDEN (J. Stefan Institute, Ljubljana) • ALCESTE -Analysis of the co-occurring lexemes within the simple statements of a text [Reinert 2002, 2003]

  21. Analysis of Text The text needs to be represented in an appropriate way for statistical analysis: • Breaking text into “units” (lines, sentences, …) • Morphological categorization (adjectives, prepositions, …) • Putting words into canonical form: • Lemmatization (is,was,are → be) • Stemming (loved, loving → lov+) • Analysis: • Clustering • Latent semantic indexing • Correspondence analysis • Classification • Visualization

  22. { { { } } } Corpus Segmented in chunks ALCESTE (Reinert,1988) Folch & Habert (2000) Hierarchical descending clustering Correspondence analysis List of typical words related to each class Geometric representation Classes of related chunks

  23. Example of Correspondence Analysis and Visualization +-----|---------|---------|---------+---------|---------|---------|-----+ • 20| solo| | • 19| | parte+ | • 18| | monitorio demand+ | • 17| | archiv+accion+ | • 16| present+ | falta+ vehiculo+fase+ | • 15| | seguir procurador+ | • 14| |recurso+ pago+quiebra+ | • 13| ofici+| gasto+ . .ejecut+ejecucion+ | • 12| sido dia+ .finca+embarg+verbal+ | • 11| interes+traficoacto+.notificacionentrega+ | • 10| momentocelebr+hall+ cuantia+resolver | • 9 | valor+ |auto+admit+qued+.juicio+deposit+ | • 8 | lesion+ venirdinero.. notific+pericial+ | • 7 | | si vista+aport+inform+ | • 6 madreacord+viviend+ | cabo solicit+ | • 5 | victima+maridoempresa+ | llev+ ya prueba+abogado+ | • 4 | ..tratosproteccion | | • 3 | .senor+alejamiento | responsabili | • 2 tema+mujer+malo+violencia | | • 1 | denunci+medida+visitas | | • 0 +--.separacion+orden+---------------+-----venirfiscal+------------------+ • 1 | pidepresun+ | | • 2 | | | • 3 | | | • 4 | | | • 5 | | | • 6 | | | • 7 | dict+ | | • 8 | | | • 9 | | | • 10| | | • 11| | | • 12| | | • 13| | | • 14| | un | • 15| | | • 16| | levantamient | • 17| | tenerdeten+ libertadforense | • 18| |person+ .. . ..hacercausa+asunto+ | • 19| servicio+ ......judicial+actuacion+ | • 20| guardia+. juezllam+ .. .policiadetenido+ | • 21| | partido+ | • +-----|---------|---------|---------+---------|---------|---------|-----+ TEXT GARDEN ALCESTE

  24. Example of Clustering Class 1: Judicial unit funcionar+ (21), juzgar(26), oficina(11), trabaj+(13), decir(26), llam+(16), mand+(12), acudir(11), adjunto(4), busc+(4), consult+(4), dato(6), hablar(4), jurisprudencia(3), local+(3), material(6), necesit+(7), policia(14), prensa(4), sala(4), funerari+(2), hurto(3), informacion(5), miedo(3), robo(3), servicio+(7), sustitu+(4), tecnico(2), venir(15) Class 2: Family law alejamiento(22), malo(22), medida(16), orden+(23), proteccion(17), senor+(13), trat+(22), victima(11), mujer(11), padre(7), denunci+(12), domestico(8), violencia(8), agresor(4), dict+(10), madre(7), marido(6), nino(5), pension(4), psicolog+(5), separacion(5), abus+(5), alimento(3), ayud+(4), casa(3), cautelar+(3), divorcio(2), empresa(3), hijo(4), lesion+(6) Class 3: Proceedings escrit+(9), fiscal+(13), instruccion(9), ordinario(5), seguir(11), acumular(5), audiencia-provincia(2), conform+(2), contradictori+(3), criterio+(10), cuantia(5), falt+(7), injusto(3), interpretacion(3), ley(6), motiv+(3), pendiente(2), perito(5) Class 4: Enforcement (judgment) ejecucion(14), ejecut+(15), embarg+(11), finca+(9), depositar+(6), interes+(6), pago(6), suspension(5), deposito(6), entreg+(6), quiebra(5), sentencia(9), solicit+(9), vehiculo(4), acreedor(3), administracion(4), cantidad(4), conden+(4), cost+(4), dinero(4), edicto(2), imposibilidad(3), multa(3), notificacion(4), pagar+(4)

  25. Stemming vs Lemmatization Stem Lema acumulacion acumulación acumularse acumular acumul+ --- admision admisión admit+ admitir celebracion celebración celebr+ celebrar misma+ mismo mismo+ --- suspenderse suspender suspend+ --- Stemming: the longest string of characters that is common to different words: For all the variants of ‘love’, but also for ‘lover’ (noun), ‘lovely’ (adverb), it can offer the stem: lov+ Lemmatization respects the category: 3 different lemma: love (verb), lover (noun) lovely (adv)If we apply this process to Spanish or Catalan (or every Romanesque language), which have a high flection capacity (60 forms for verbs, without taking into account the composed forms), stemming would hide a lot of information. EXAMPLES

  26. Quantitative Comparison • Lemmatized corpus has fewer word-forms than the stemmed version. • The LSI on the lemmatized corpus is able to reconstruct documents better, especially in few dimensions. • The lemmatized corpus clustering is more detailed.

  27. Comparision of Clustering Results • Clustering with stemmed corpus offers us 4 classes: • ‘On-duty’ actions (mixed with Judicial Office) (54,06%) • Proceedings and Trial (18,10%) • Enforcement (judgements) (14,39%) • Family Law (gender violence, divorce, separation…) (13,46%) • Clustering with lemmatized corpus is more detailed and offers 6 classes: • Judicial Office (20,11%) • ‘On-duty’ actions (27,25%) • Family Law (gender violence, divorce, separation…)(14,55%) • Proceedings (15,61%) • Trial (8,47%) • Enforcement (judgements) (14,02%)

  28. Take-Home Messages • Do text analysis of legal documents! • If you do that, Do lemmatization!

  29. Methodology

  30. Initial Methodology + Based on 800 competency questions + Questions were clustered + Middle-out strategy – Usage of ontology not considered – Repetitive discussions – Long discussions

  31. Considering the “Why” • No normative knowledge • Stick to the questions as sources • Model the questions, not the answers

  32. Wiki visualization

  33. Diligent Argumentation Ontology Argumentation ontology defined Based on Case Studies to identify the most effective types of arguments Argument type recognition based on RST

  34. Methodology changes Using DILIGENT made the ontology engineering… • … much faster • … amenable to distributed development • … better documented • … trackable • … better manageable Also DILIGENT itself got changed!

  35. Outlook • Better tool support – off-the-shelf wiki had weaknesses • Moderator support in discussions • Competency question clustering • Gathering further experience from legal and other case studies

  36. Architecture

  37. High Level Requirements • Judges should not be bothered with a complex user interface. • A simple natural language interface is probably appropriate. • The decision as to whether a new question is similar to a stored question (with its corresponding answer) should be based on semantics rather than on simple word matching. • An ontology can be used to perform this semantic matching of questions. • The questions included in the system should be of highquality. • Be rather exhaustive and reflect the actual situation • As extensive survey with more than 250 Spanish judges forms the basis for the questions. • Justify the answer provided by the system with existing Jurisprudence. • Jurisprudence databases. • Metadata and Ontology process of documents. • KnowledgeManagement at all levels

  38. Example Question-Answer • Question: • What problems can we foresee with the analysis of small amounts of drugs, where the identification test destroys the drugs? • Answer: • This is an unrepeatable piece of evidence at the trial. In these cases, the Spanish Criminal Procedure Act states that the adversarial principle should be respected. While the trial proceedings are prepared, the judge must explain to all parties that they may choose an expert to perform these tests.

  39. Example of judgment: parts Court and docket number Grounds of Decision Names of the magistrates Date and place Prefatory statement History of the Case

  40. Relations between the Question/Answer & Judgment Judgement Summary FAQ Case History Decision Grounds Question Ruling Answer OPJK Practical Knowledge Instances

  41. Architecture Web browser Natural Language DB 1 DB N Decisions Decisions Questions- Answers Ontology Learning & feeding Semantic Matching Ontology Merging Ontology Alignment Expert Knowledge Jurisprudence

  42. Expert Knowledge Retrieval Design - Technological considerations iFAQ System Multistage Searching Subsystem Accuracy Eficiency Ontology Domain Detection Keyword Matching Ontology Grapth Path Matching Natural Language Processing Ontology Technology Caching subsystem Persistence subsystem

  43. Expert Knowledge Retrieval Plugged Searching Stages • Chain of Resposability pattern FAQ FAQ FAQ FAQ Ontology Domain Detection Keyword/synonym matching stage Ontology graph path matching FAQ Candidates User Question iFAQ Search Engine Other search engines ... Search Factory

  44. Expert Knowledge Retrieval Semantic Similarity: Main steps Ontology Ontology Linking Semantic Distance Calculation NL query NLP POS list (lemmas) Term Coverage Calculation between queries Best match of stored queries Semantic distance Between queries

  45. Expert Knowledge Retrieval Ontology Denounce Actions Mother Mother Son Son Accuse Follow Semantic Similarity • The semantic distance is based on the weighted navigation distance between terms in the ontology. • Navigation through the ontology means that one moves from one concept to another concept, via one of its relations or attributes. • Is a • Follows • Actor • Etc. • The task of associating distance costs: • Is a domain specific • Needs to be performed by legal expert.

  46. Conclusions • Decision support system for unexperienced judges • Using Semantic Web technology for handling knowledge • Provide knowledge for decision making process • Capture knowledge from experts • Share knowledge among all users • Extended understanding capacities • Background knowledge: Professional Legal Ontology • Decision Explanation • Improved Knowledge Acquisition

More Related