1 / 19

Parsing Estonian with Constraint Grammar

Explore morphological disambiguation and syntactic analysis in Estonian language using Constraint Grammar framework. Learn about results, applications, and future work in parsing Estonian text. Developed at Tallinn Technical University's Institute of Cybernetics.

Download Presentation

Parsing Estonian with Constraint Grammar

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Parsing Estonian with Constraint Grammar Kaili Müürisep Institute of Cybernetics at Tallinn Technical University

  2. Outline • Background • Constraint Grammar framework • Morphological disambiguation • Syntactic analysis • Results • Applications • Future work

  3. Background • Project started in 1995/96 • Two grammar-writers: • morphological disambiguation - Tiina Puolakainen • syntax - Kaili Müürisep

  4. Constraint Grammar • proposed by Fred Karlsson 1990 (University of Helsinki) • employs surface-near dependency-oriented syntax • rule-based • integrates morphological disambiguation and shallow syntactic analysis

  5. Morphological analysis CG - Parsing Scheme Input text Identification of clause boundaries Morphological disambiguation Determination of syntactic functions Analysed text

  6. Morphologically analyzed sentence Eesti Eesti+0 //_S_ prop sg gen #cap // Estonia Eesti+0 //_S_ prop sg nom #cap // eesti+0 //_G_ #cap // Estonian vanimad vanim+d //_A_ super pl nom // oldest asukad asukas+d //_S_ com pl nom // dwellers saabusid arrived saabu+sid //_V_ main indic impf ps2 sg ps af #Intr // saabu+sid //_V_ main indic impf ps3 pl ps af #Intr // siia siia+0 //_D_ // here siig+0 //_S_ com sg gen // whitefish

  7. pärast pärast+0 //_D_ // afterwards pärast+0 //_K_ post #gen // after pärast+0 //_K_ pre #part // pärane+t //_A_ pos sg part // pära+st //_S_ com sg el // residue or stern viimast viimane+t //_A_ pos sg part // last vii+mast //_V_ main sup ps el #NGP-P // take, lead ... jääaega jää_aeg+0 //_S_ com sg adit // ice-age jää_aeg+0 //_S_ com sg part // $. . //_Z_ Fst //

  8. Morphologically disambiguated sentence Eesti Eesti+0 //_S_ prop sg gen #cap // vanimad vanim+d //_A_ super pl nom // asukad asukas+d //_S_ com pl nom // saabusid saabu+sid //_V_ main indic impf ps3 pl ps af #Intr // siia siia+0 //_D_ // pärast pärast+0 //_K_ pre #part // viimast viimane+t //_A_ pos sg part // jääaega jää_aeg+0 //_S_ com sg part //

  9. After adding syntactic labels Eesti Eesti+0 //_S_ prop sg gen #cap//**CLB @OBJ @ADVL @NN> vanimad vanim+d//_A_ super pl nom // @ADVL @AN> @<AN @PRD asukad asukas+d //_S_ com pl nom //@SUBJ @PRD @OBJ @NN> @<NN @ADVL @<Q saabusid saabu+sid//_V_ main indic impf ps3 pl ps af #Intr // @+FMV siia siia+0//_D_ // @ADVL @AD> @<AD pärast pärast+0 //_K_ pre #part // @ADVL @PN> @<PN viimast viimane+t //_A_ pos sg part // @AN> @<AN @ADVL jääaega jää_aeg+0 //_S_ com sg part // @SUBJ @OBJ @ADVL @<Q @NN> @<NN @<P

  10. Syntactically analyzed sentence Eesti Eesti+0 //_S_ prop sg gen #cap // **CLB @NN> vanimad vanim+d //_A_ super pl nom // @AN> asukad asukas+d //_S_ com pl nom // @SUBJ saabusid saabu+sid //_V_main indic impf ps3 pl ps af #Intr // @+FMV siia siia+0 //_D_ // @ADVL pärast pärast+0 //_K_ pre #part // @ADVL viimast viimane+t //_A_ pos sg part // @AN> jääaega jää_aeg+0 //_S_ com sg part // @<P

  11. Actually ... Eesti @NN> vanimad @AN> asukad @SUBJ saabusid saabu+sid //_V_main indic impf ps3 pl ps af #Intr // @+FMV siia siia+0 //_D_ // @ADVL pärast pärast+0 //_K_ pre #part // @ADVL viimast viimane+t //_A_ pos sg part // @AN> vii+mast //_V_ main sup ps el #NGP-P // @ADVL jääaega jää_aeg+0 //_S_ com sg part // @<P @OBJ jää_aeg+0 //_S_ com sg adit // @ADVL

  12. Morphological disambiguation • Morphological analyser of Estonian assigns adequate morphological descriptions to about 99% of tokens in a text. • In morphologically analysed Estonian text over 45% of all words are ambiguous and have 2 – 15 readings. • > 1125 constraints • 85-90 % of words become morphologically unambiguous and the error rate of the disambiguator is less than 2 %.

  13. Morphological disambiguation (2) • The major ambiguities are between: • The adjectival and verbal readings of participles • The nominative, genitive, partitive and short illative cases of a noun. • The adposition, adverb and noun readings. • Some coincidences: sai (white bread, got), viis (five, melody, carried), tee (tea, road, do!),või (butter, or, may), tuli (fire - light, came)

  14. Morphological disambiguation (3) Most difficult is disambiguate between nominative, genitive, partitive and short illative cases: (1) maailma-GEN juhtivad majandusriigid the leading economic states of the world (2) maailma-PART juhtivad majandusriigid the economic states leading the world (3) maailma-ILLAT juhtivad majandusriigid the economic states leading into the world

  15. Determination of syntactic functions • 27 syntactic tags (subject, object, adverbial etc) • no direct connection between attribute and head professori (@NN>) nahast (@NN>) portfell professor-GEN leather-ELAT portfolio • > 1300 rules • 83-90% of words become syntactically unambiguous • Correctness is 96.5 - 98.5%

  16. Syntactic disambiguation - problems • Adverbial versus adverbial attributes • Mees sai siiski pidada ühendust mobiiltelefoniga (@ADVL @NN> @<NN) Kosovos sõdivate poegadega. • Man could still keep connection with_mobile-phone in_Kosovo fightening with_sons. • Object in genitive or attribute • Ta asetas mantli (gen @OBJ @NN>) tooli (gen @OBJ @NN>) seljatoele (@ADVL @<NN) • He put coat-GEN chair-GEN back-ALLAT. • 'He put the coat onto the back of a chair.'

  17. Syntactic disambiguation - errors • One clause divides the other into two parts: • Seega oli samm, mille astus Eesti, palju pikem ja otsustavam. • Thus the step, that Estonia took, was bigger and more decisive. • Ellipsis • Determination of apposition, quantifiers

  18. Applications • Automatic summary generator • Noun phrase detector • Linguistic research • Promising fields of applications: • Information retrieval • Text-to-speech synthesis • Grammar and style checker • Machine translation, translation aids

  19. Future work • Improvement of lexicon, integration of analyser with semantic database • Bigger training corpus • Use of statistical methods • Improvement of tag set • Deeper structure • Prototypes of practical applications

More Related