1 / 14

Universal Dependencies

Explore Universal Dependencies, a framework for multilingual NLP research to ensure cross-linguistic consistency in grammatical annotation while supporting language-specific needs. Benefit from annotated treebanks, part-of-speech tags, morphological features, and dependency relationships across various languages. This open community project promotes parallelism, lexicalism, and recoverability, enabling the transparent mapping of input text to word segmentation. Join the effort to contribute and expand this resource for diverse linguistic studies.

erinpowell
Download Presentation

Universal Dependencies

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Universal Dependencies JoakimNivre Uppsala University

  2. Universal Dependencies • Background: • Treebank annotation schemes vary across languages • Hard to compare results across languages [Nivre et al. 2007] • Hard to evaluate cross-lingual learning [McDonald et al. 2013] • Hard to build multilingual systems • Universal Dependencies (http://universaldependencies.github.io/docs/): • Stanford universal dependencies [de Marneffeet al. 2014] • Google universal part-of-speech tags [Petrov et al. 2012] • Interset morphological features [Zeman 2008] First guidelines released Oct 1, 2014 First 10 treebanks released Jan 15, 2015

  3. Universal Dependencies • Syntactic words – explicit splitting of clitics and contractions • Universal part-of-speech tags + morphological features • Dependency tree + augmented dependencies (not shown)

  4. Goals • Cross-linguistically consistent grammatical annotation • Support multilingual NLP and linguistic research • Build on common usage and existing de-facto standards • Complement – not replace – language-specific schemes • Open community effort – anyone can contribute

  5. Guiding Principles • Maximize parallelism • Don't annotate the same thing in different ways • Don't make different things look the same • Don't annotate things that are not there • Don't annotate things that are not there • Languages select from a universal pool of categories • Allow language-specific extensions

  6. Design Principles • Dependency • Widely used in practical NLP systems • Available in treebanks for many languages • Lexicalism • Basic annotation units are words – syntactic words • Words have morphological properties • Words enter into syntactic relations • Recoverability • Transparent mapping from input text to word segmentation

  7. Morphological Annotation • Lemma represent the semantic content of a word • Part-of-speech tag represent its grammatical class • Features represent lexical and grammatical properties of the lemma or the particular word form

  8. Syntactic Annotattion • Content words are related by dependency relations • Function words attach to the content word they modify • Punctuation attach to head of phrase or clause

  9. CoNLL-U Format

  10. Dependency Structure • Keeping content words as heads promotes parallelism • Function words often correlate with morphology English Swedish

  11. Dependency Relations [de Marneffeet al. 2014] • Taxonomy of 42 universal grammatical relations, broadly supported across many languages in language typology • Language specific subtypes can be added

  12. Morphology: POS • Taxonomy of 17 universal part-of-speech tags, based on the Google Universal Tagset [Petrov et al. 2012]

  13. Morphology: Universal Features • Standardized inventory of morphological features, based on the Interset system [Zeman 2008]

  14. Morphology: Examples la Definite=Def|Gender=Fem|Number=Sing|PronType=Art hannoMood=Ind|Number=Plur|Person=3|Tense=Pres|VerbForm=Fin fatto Gender=Masc|Number=Sing|Tense=Past|VerbForm=Part casa Gender=Fem|Number=Sing

More Related