1 / 160

Dependency Syntax. An Introduction

Dependency Syntax. An Introduction. Leonid Iomdin Institute for Information Transmission Problems, Russian Academy of Sciences iomdin@iitp.ru, iomdin@gmail.com. Program Overview: p. 1.

vanida
Download Presentation

Dependency Syntax. An Introduction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Dependency Syntax. An Introduction Leonid Iomdin Institute for Information Transmission Problems,Russian Academy of Sciences iomdin@iitp.ru, iomdin@gmail.com

  2. Program Overview: p. 1 • 1. Basic Principles of The Meaning-Text theory by Igor Mel’čuk. Language as a Universal Translator of Senses to Texts and Texts to Senses. Text analysis and text generation. The theory of integral linguistic description by Juri Apresjan. The grammar and the dictionary of language. • 2. Two syntactic levels of sentence representation: surface syntax and deep syntax. December 21, 2009. Lectures 13-14

  3. Program Overview: p. 2 • 3. The dependency tree structure as a syntactic representation of the sentence. Dependency tree vs. Constituent tree: advantages and drawbacks of both types of representation. Limits of the dependency tree. The hypothesis of two syntactic starts. • 4. The notions of syntactic relation. Major classes of syntactic relations: actant, attributive, coordinative and auxiliary relation classes. • 5. The notion of syntactic feature. Syntactic features vs. Semantic features. December 21, 2009. Lectures 13-14

  4. Program Overview: p. 3 • 6. Actants and valencies. Active, passive and distant valencies. The government pattern of a dictionary entry. An overview of actant syntactic relations. The predicative relation. The agentive relation. Completive relations. • 7. An overview of attributive syntactic relations. Grammatical Agreement. Numerals and Quantitative Constructions. The system of Quantification Syntax of Russian. • 8. Grammatical coordination as a type of grammatical subordination. An overview of coordinative syntactic relations. December 21, 2009. Lectures 13-14

  5. Program Overview: p. 4 • 9. Auxiliary syntactic relations. Analytical grammatical forms as an object of syntax. • 10. Microsyntax of Language. Minor Type Sentences. Syntactic Idioms. • 11. Lexical Functions in the Dictionary and the Grammar. • 12. Syntactic description and syntactic rules. Dependency Syntax in NLP. Dependency Syntax in Machine Translation. Syntactically Tagged Corpus of Texts. December 21, 2009. Lectures 13-14

  6. Lexical Functions • Substitute LF • synonyms, antonyms, converse terms, derivatives • Collocate LF • MAGN = 'a high degree of what is denoted by X’ • OPER/FUNC • ... December 21, 2009. Lectures 13-14

  7. Lexical Functions: Magn • MAGN (disease) = grave • MAGN (fog) = heavy • MAGN (control) = strict • MAGN (болезнь) = тяжелый • MAGN (туман) = густой • MAGN (контроль) = строгий December 21, 2009. Lectures 13-14

  8. Lexical Functions: Oper / Func Family December 21, 2009. Lectures 13-14

  9. Examples of LF Oper • Oper1 (invitation) = issue • Oper2(invitation) = receive • Oper1 (defeat) = suffer • Oper2 (resistence) = encounter • Oper2 (respect) = enjoy December 21, 2009. Lectures 13-14

  10. Examples of LF Func • Func1 (fear) = possess • Func2 (decision) = concern • Func1 (responsibility) = rest (with) • Func2 (vengeance) = fall (upon) December 21, 2009. Lectures 13-14

  11. General Properties of Lexical Functions • Universality • Intralinguistic idiomaticity • grave disease, heavy fog • *heavy disease, *grave fog. • Cross-linguistic idiomaticity • Rus. tjazhelaja bolezn’ ‘heavy disease’ • Rus. gustoj tuman ‘dense fog’ December 21, 2009. Lectures 13-14

  12. General Properties of Lexical Functions (cont.) • Paraphrasing Potential: • He respects [X] his teachers • He has [OPER1 (S0 (X))] respect [S0 (X)] for his teachers • He treats [LABOR12 (S0 (X))] his teachers with respect • His teachersenjoy [OPER2 (S0(X))] his respect December 21, 2009. Lectures 13-14

  13. LF inPractical Applications • Syntactic and Lexical Ambiguity Resolution in Parsers • Idiomatic Translation of a Large Class of Set Expressions in Machine Translation • Sentence Paraphrasing December 21, 2009. Lectures 13-14

  14. Lexical Ambiguity Resolution • to draw a distinction - provodit' razlichie • Both verbs are extremely ambiguous: • draw - more than 50 meanings • provodit’ - more than 10 meanings December 21, 2009. Lectures 13-14

  15. Syntactic Ambiguity Resolution • support of the army • 'support by the army' • 'support (given) to the army' • Thepresident had [Y=OPER2(X)]the support [X] of the army December 21, 2009. Lectures 13-14

  16. Syntactic Ambiguity Resolution • The fear [X] of his wife possessed [Y = FUNC1 (X)] Peter • The fears of his wife infectedPeter. December 21, 2009. Lectures 13-14

  17. Idiomatic translation: LF Temp • March: in – mart: v2 • Tuesday: on – vtornik: v1 • dawn: at – rassvet: na2 • moment: at – moment: v1 • Easter: at – pasxa: na1 December 21, 2009. Lectures 13-14

  18. Sentence Paraphrasing • X = CONV12 (X) This group consists of 20 persons – Twenty persons comprise this group; • X + Y = ANTI1(X) + ANTI2(Y) He began to observe the rules – He stopped violating the rules • X = LABOR12 + S0(X) He respects his parents – He treats his parents with respect December 21, 2009. Lectures 13-14

  19. ETAP-3 Options • Machine Translation • Deeply Annotated Text Corpus of Russian (SynTagRus) • Translation System Based on UNL (Universal Networking Language) Interlingua • Synonymous and Quasi-Synonymous Paraphrasing of Utterances • Computer-Aided Language Learning Tool • New Developments: Semantics and Ontologies December 21, 2009. Lectures 13-14

  20. SynTagRus Currently the treebank contains over 42,000 sentences (ca. over 600,000 words) belonging to texts of a variety of genres (contemporary fiction, popular science, newspaper and journal articles dated between 1960 and 2009, texts of online news etc.) and is steadily growing. It is an integral but fully autonomous part of the Russian National Corpus developed in a nationwide research project. It can be freely consulted on the Web (www.ruscorpora.ru). December 21, 2009. Lectures 13-14

  21. SynTagRus Since Russian is a language with relatively free word order, SYNTAGRUS adopted a dependency-based annotation scheme, in a way parallel to the Prague Dependency Treebank (see e.g. Hajič et al. 2000). December 21, 2009. Lectures 13-14

  22. SynTagRus December 21, 2009. Lectures 13-14

  23. SynTagRus What we have just seen is a screenshot of the dependency tree for the sentence (1) Наибольшее возмущение участников митинга вызвал продолжающийся рост цен на бензин, устанавливаемых нефтяными компаниями ‘It was the continuing growth of petrol prices set by oil companies that caused the greatest indignation of the participants of the meeting’. December 21, 2009. Lectures 13-14

  24. SynTagRus Here, nodes represent words (lemmas) assigned morphological and part-of-speech tags, whilst arcs are labeled with names of syntactic links. The tagging uses about 75 syntactic links, half of them proposed in Igor Mel’čuk’s Meaning  Text Theory (Mel’čuk 1988). December 21, 2009. Lectures 13-14

  25. SynTagRus Normally, one token corresponds to one node in the dependency tree. There are however a noticeable number of exceptions. The main types of exceptions include: December 21, 2009. Lectures 13-14

  26. SynTagRus • composite words like пятидесятиэтажный ‘fifty-storeyed’ where one token corresponds to two or more nodes; • so-called phantom nodes for the representation of hard cases of ellipsis which do not correspond to any particular token in the sentence (cf. Якупилрубашку, аонгалстук‘lit. I bought a shirt and he a tie’, which is expanded into Якупилрубашку, аонкупилPHANTOMгалстук‘I bought a shirt and he bought PHANTOM a tie’; • multiword expressions like покрайнеймере ‘at least’ where several tokens correspond to one node. December 21, 2009. Lectures 13-14

  27. SynTagRus Morphological Tagging of SYNTAGRUS is based on a comprehensive morphological dictionary of Russian that counts about 130,000 entries (over 4 million word forms). ETAP-3 morphological analyzer uses the dictionary to produce morphological annotation of words belonging to the corpus, which includes the lemma, POS tags, and, depending on POS, a set of morphological features. December 21, 2009. Lectures 13-14

  28. Syntactic Markup Language The syntactic markup language of the corpus is XML, because it is universally accepted and because it satisfies certain important requirements that the corpus must meet: December 21, 2009. Lectures 13-14

  29. Syntactic Markup Language • the corpus must feature several layers of linguistic data that can be extracted from the annotation independently of each other; • it should be scalable and incrementable both quantitatively and qualitatively so that new types of information could be added easily; • it must be supplied by standard programming means for text parsing, sophisticated search, and conversion. December 21, 2009. Lectures 13-14

  30. Structure Editor It is a complex software environment aimed at automatic generation of morpho-syntactic and lexical functional annotation of texts, manual editing of annotation results, and fully manual annotation. Automatic generation is only possible for texts in natural languages that are supported by the ETAP-3 linguistic processor . December 21, 2009. Lectures 13-14

  31. Structure Editor In principle, Structure Editor is not language-specific and can be used for annotation of texts in any natural language, primarily one with rich morphology. December 21, 2009. Lectures 13-14

  32. Structure Editor StrEd allows the annotator to use diverse dialog interfaces in order to view the whole text; view a sentence as a table in which every line corresponds to a particular word of the sentence; view the syntactic dependency tree for a sentence; to view information on a particular word of the sentence; view the discrepancies within the results of automatic tagging and manual tagging of a sentence. December 21, 2009. Lectures 13-14

  33. Structure Editor StrEd view presenting the sample text at an initial stage with no morphosyntactic tagging performed. December 21, 2009. Lectures 13-14

  34. Structure Editor • As a rule, the first step of text annotation is automatic tagging. After it is obtained, the sentences are revised by the annotator, who detect and corrects the errors. To conveniently view the dependency tree structure and manipulate with it, Edit Structure dialog can be used. December 21, 2009. Lectures 13-14

  35. Structure Editor December 21, 2009. Lectures 13-14

  36. Structure Editor • In this view, the annotator can perform all typical actions that modify the original tagging; in particular, the editor can rearrange the structure or delete the syntactic relations by simple mouse gestures, alter the lemmas, syntactic links, or grammatical features. • If these operations do not suffice to obtain the desirable results, the annotator may continue the editing by switching to another dialog, intended for sentence properties viewing and manipulation, which allows performing less typical operations with the sentence. December 21, 2009. Lectures 13-14

  37. Structure Editor December 21, 2009. Lectures 13-14

  38. Morpho-syntactic annotation Петр крепко спит <S ID="1" > <W DOM="3" EXTRAFEAT="CAP" FEAT="S ЕД МУЖ ИМ ОД" ID="1" KSNAME="ПЕТР" LEMMA="ПЕТР“ LINK="предик"> Петр</W> <W DOM="3" FEAT="ADV" ID="2" KSNAME="КРЕПКО" LEMMA="КРЕПКО" LINK="обст">крепко</W> <W DOM="_root" EXTRAFEAT="ЛИЧ" FEAT="V НЕСОВ НЕПРОШ ИЗЪЯВ 3-Л ЕД" ID="3" KSNAME="СПАТЬ" LEMMA="СПАТЬ">спит</W> </S>. December 21, 2009. Lectures 13-14

  39. Sentence of average complexity Пчелиные ульи и муравьиные колонии служат хорошим примером: несмотря на относительную простоту организма отдельных насекомых и незначительные возможности их мозга, образуемый ими социум представляет собой весьма сложную систему, отличающуюся исключительной прочностью и слаженностью функционирования. Beehives and ant colonies serve as a good example: despite a relative simplicity of the body of individual insects and insignificant potentials of their brains, the social medium formed by them is a very complex system which is distinguished by exceptional strength and harmony of functioning. December 21, 2009. Lectures 13-14

  40. Morpho-syntactic annotation December 21, 2009. Lectures 13-14

  41. Lexical Functional Annotation • The newest version of SYNTAGRUS contains partial lexical functional annotation: for collocations that could be presented with the apparatus of lexical functions, the tagging includes information on values and attributes of such lexical functions. December 21, 2009. Lectures 13-14

  42. Lexical Functional Annotation December 21, 2009. Lectures 13-14

  43. Lexical Functional Annotation December 21, 2009. Lectures 13-14

  44. Lexical Functional Annotation • Lexical functional annotation of a corpus sentence can be produced in three ways: • automatically, together with syntactic parsing by running the ETAP-3 parser on the sentence; • automatically, by running a subset of ETAP-3 rules on the ready syntactic structure of the sentence approved by the expert; using the StrEd option “Let ETAP find them (LFs)”, • manually. • The list of LF argument and values, irrespective of the way it was produced, can be manually edited: information on functions can be modified, added, or removed. December 21, 2009. Lectures 13-14

  45. Annotation Tools • Considering the significant size of SynTagRus (over 500,000 words ) the annotation process has to be automated to the fullest extent possible. • On the other hand, automatic annotation has to allow for verification and, if need be, correction by a human expert. • This means that the environment has to provide for comfortable viewing and editing of annotated texts. December 21, 2009. Lectures 13-14

  46. Intellectual Debugger In order to diagnose nontrivial annotation errors, a powerful instrument, Intellectual Debugger (IntelDeb), was specially created to verify, in one quick step, whether the current syntactic annotation of a sentence (probably the result of several human interventions) is compatible with at least one of the parsing in principle achievable through the automatic ETAP-3 parser. December 21, 2009. Lectures 13-14

  47. Intellectual Debugger IntelDeb can be considered as a specific parser which, unlike the regular ETAP parser, does not produce multiple parses of a sentence. Instead, if the IntelDeb finds that the structure being subject to verification is inadmissible, its goal is to diagnose the cause, or causes, of the situation as precisely as possible. December 21, 2009. Lectures 13-14

  48. Intellectual Debugger • The underlying idea is to run the parser consecutively on all binary subtrees as presented by the annotation and see whether the existing syntactic rules and dictionaries permit the construction of such subtrees. The algorithm checks all rules with regard to a specific syntactic link (there may be dozens of such rules and all possible lemmas for the given pair of words, starting with the rules and lemmas cited in the annotation but gradually loosening the grip and resorting to other rules and lemmas if the current choice cannot be confirmed. December 21, 2009. Lectures 13-14

  49. The Hypothesis of Two Syntactic Starts We will be dealing with a special type of sentences with embedded (semi-)phraseological expressions like He does the Devil knows what or its Russian equivalent Он занимается чёрт знает чем. December 21, 2009. Lectures 13-14

  50. The Hypothesis of Two Syntactic Starts It is very difficult to build adequate syntactic representations for such sentences. A controversial solution is proposed for this problem, admitting that sentences of this type have two syntactic starts, or syntactic heads. December 21, 2009. Lectures 13-14

More Related