1 / 15

EUCLCORP: Multilingual Legal Corpus for EU Case Law Analysis

EUCLCORP is a standardized, multidimensional corpus of EU Court of Justice & EU member states' constitutional/supreme courts' case law. It provides linguistic and metadata annotation tools for empirical legal linguistics studies.

Download Presentation

EUCLCORP: Multilingual Legal Corpus for EU Case Law Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.


Presentation Transcript

  1. The European Union case law corpus (EUCLCORP) Aleksandar Trklja University of Birmingham

  2. What is EUCLCORP? • The European Union case law corpus (EUCLCORP) is a standardised, multidimensional and multilingual corpus of the case law of the Court of Justice of the European Union (CJEU) and of eight EU member states’ constitutional/supreme courts.

  3. Project development • The project has been developed in the following phases: • Phase one: project application • Phase two: data compilation • Phase three: data annotation • Phase four: web-interface • Supported by a European Research Council (ERC) Proof of Concept grant • Based at the University of Birmingham (July 2016 - December 2017).

  4. Not just another legal database

  5. Not just another legal database • Unlike conventional legal databases EUCLCORP contains the following corpus tools: • monolingual concordance lines • parallel concordance lines • collocations • frequency lists • n-grams • simple search • CQP-based search

  6. Annotation • The corpus has been annotated with linguistic and external metadata information. • Linguistic information: tokenization, lemmatization, parts-of-speech tags, sentence and paragraph boundaries and enumeration of sentences and paragraphs.

  7. Annotation • Non-linguistic metadata for CJEU subcorpus: text sections (Summary, Parties, Grounds, Costs, Operative Part and Subject), language of the case, case name, case number, date and cellar number. • Non-linguistic metadata for national judgments: language of the case, name of the court, date, case name and names of judges. • Sentences from ECJ judgments: aligned at the sentence level to enable the search on parallel concordance lines.

  8. ECJ judgments

  9. National judgments

  10. Web interface and corpus tools User-friendly interface for the search query [lemma="increase" & tag="V.*"] ]{0,2}[ tag="N.*"] ::match.meta_date="1980.*" within grounds

  11. Web interface and corpus tools N-grams associated with the token ‘capable’

  12. Web interface and corpus tools

  13. Contribution • EUCLCORP has been created with the aim to foster the development of empirical legal linguistics studies.

  14. Contribution • EUCLCORP allows users to investigate in a systematic way: • the history of the meaning(s) of a particular legal term; • features that distinguish legal language from languages used in other registers; • in the case of ambiguous terms – the senses in which they are most frequently and most typically used; • the influence of national legal languages on EU case law (and vice versa); • the impact of translation on the development of EU case law; • discourse relations and argumentation patterns.

  15. Thank you for your attention!

More Related