1 / 40

Multi-word Expressions and CG

Multi-word Expressions and CG. How should MWEs be described?. Questions discussed in a workshop on MWEs – ACL 2007. Is it sufficient to use purely statistical methods for the extraction of MWEs from corpora, or is it necessary to harness human knowledge and linguistic insights?.

halden
Download Presentation

Multi-word Expressions and CG

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Multi-word Expressions and CG How should MWEs be described?

  2. Questions discussed in a workshop on MWEs – ACL 2007 • Is it sufficient to use purely statistical methods for the extraction of MWEs from corpora, or is it necessary to harness human knowledge and linguistic insights?

  3. Questions discussed in a workshop on MWEs – ACL 2007 • Is fully automatic MWE extraction feasible, or will manual validation always be required?

  4. Questions discussed in a workshop on MWEs – ACL 2007 • What is the nature of MWEs, and how can they be defined formally?

  5. Questions discussed in a workshop on MWEs – ACL 2007 • To what extent can definitions and extraction procedures be generalised to other languages, other text types and other types of MWEs?

  6. Questions discussed in a workshop on MWEs – ACL 2007 • Can and should we distinguish subtypes of MWEs for NLP applications?

  7. Questions discussed in a workshop on MWEs – ACL 2007 • Is it sufficient to use purely statistical methods for the extraction of MWEs from corpora, or is it necessary to harness human knowledge and linguistic insights? • Comment: Underlying the question, there is a fundamental misunderstanding on what languages are about. And what is bad in knowledge and linguistic insight?

  8. Questions discussed in a workshop on MWEs – ACL 2007 • Is fully automatic MWE extraction feasible, or will manual validation always be required? • Comment: Hopefully yes for both.

  9. Questions discussed in a workshop on MWEs – ACL 2007 • What is the nature of MWEs, and how can they be defined formally? • Comment: • - At least they are not the same as collocations. • - Absence of one to one mapping of members in translation. • - Hints to a single semantic concept.

  10. Questions discussed in a workshop on MWEs – ACL 2007 • To what extent can definitions and extraction procedures be generalised to other languages, other text types and other types of MWEs? • Comment: I think they are generalizable.

  11. Questions discussed in a workshop on MWEs – ACL 2007 • Can and should we distinguish subtypes of MWEs for NLP applications? • Comment: Definitely yes. They often comprise separate POS categories.

  12. How and where to describe MWEs? • Two categories of MWEs: • - frozen clusters of words • - clusters of words, the members of which may inflect

  13. How and where to describe MWEs? • Frozen clusters of words • - may be described in the tokenizer and analyzed as a single unit

  14. How and where to describe MWEs? • Inflecting clusters of words • - cannot be described in the tokenizer • - they must be described after analysis • when all necessary linguistic information is available

  15. How and where to describe MWEs? • One possible solution: • - describe frozen MWEs in the tokenizer • - describe inflecting MWEs alter morphological analysis • This was the earlier solution in Swahili Language Manager (SALAMA)

  16. How and where to describe MWEs? • Another solution: • - describe all MWEs after morphological analysis • - exceptions are a few fully lexicalized structures that are written as separate words • This solution is applied in current SALAMA

  17. How and where to describe MWEs? • In describing inflecting MWEs, the following requirements apply: • - each member must be described • - the relative location of each member must be described • - other words and punctuation marks in between members must be allowed • - manipulation of the linguistic information (i.e. tags) must be possible, because the whole cluster will be described anew • - it must be possible to isolate the newly described cluster and treat it as a single lexical unit

  18. CG in describing MWEs • In SALAMA, CG-2 was used for describing MWEs

  19. CG in describing MWEs • Phase 1. • Analyze text: • ameikubali "kubali" V 1/2-SG3-SP VFIN { he/she } PERF:me 9/10-SG-OBJ { it } [kubali] { accept } SVO AR • shingo "shingo" N 9/10-0-SG { a/the } { neck } • upande "upande" ADV { aside }

  20. CG in describing MWEs • Phase 2. • Identify the MWE and describe its structure: • ameikubali "kubali" V 1/2-SG3-SP VFIN { he/she } PERF:me 9/10-SG-OBJ { it } [kubali] { accept } SVO AR • shingo "shingo" IN 9/10-0-SG { a/the } { neck} • upande "upande" <<IDIOM { accept unwillingly } • Note: Only the last member is affected, and the new lexical gloss is attached to it

  21. CG in describing MWEs • Phase 3. • Remodify the other members of the MWE: • ameikubali "kubali" V CAP 1/2-SG3-SP VFIN { he/she } PERF:me 9/10-SG-OBJ { it } [kubali] IDIOM-V>> SVO AR • shingo "shingo" IDIOM<> • upande "upande" <<IDIOM { accept unwillingly } • Note: Gloss in English is rewritten, but necessary linguistic information in verb is retained

  22. CG in describing MWEs • Phase 4. • Isolate the MWE as a single lexical unit: • ("kubali_shingo_upande" V CAP 1/2-SG3-SP VFIN { he/she } PERF:me 9/10-SG-OBJ { it } SVO AR IDIOM-V>> { accept unwillingly } )

  23. CG in describing MWEs • Phase 5. • Surface form in English: • (V CAP 1/2-SG3-SP VFIN { he/she } PERF:me 9/10-SG-OBJ SVO AR IDIOM-V>> { has accepted { it } unwillingly } ) • Phase 6. • he/she has accepted it unwillingly • Note 1: Surface form is written using lexical and linguistic information • Note 2: The order of words, and their inclusion/exclusion is controlled by re-ordering rules

  24. Problematic cases • Original analysis: • amechukua "chukua" V 1/2-SG3-SP VFIN { he/she } PERF:me [chukua] { take} SVO • hatua "hatua" N 9/10-0-PL { step } AR • tatu "tatu" NUM 9/10-PL CARD { three } • Marking the idiom (wrong): • amechukua "chukua" V 1/2-SG3-SP VFIN { he/she } PERF:me SVO IDIOM-V> • hatua "hatua" <IDIOM { take action } • tatu "tatu" NUM 9/10-PL CARD { three }

  25. Safe cases • Safe case: • amepiga "piga" V 1/2-SG3-SP VFIN { he/she } PERF:me [piga] { hit } SVO • hatua "hatua" N 9/10-0-SG { a/the } { step } AR • amepiga "piga" V 1/2-SG3-SP VFIN { he/she } PERF:me SVO IDIOM-V> • hatua "hatua" <IDIOM { advance } • he/shehasadvanced

  26. Types of MWEs • Several types of MWEs, and each needs to be treated in a specific way

  27. Types of MWEs • Idiomatic expressions: • - they often include a verb as a member • - a large number of surface forms • Alipiga kinanda. • REPLACE (<IDIOM { play piano }) TARGET ("kinanda") • (-1 ([piga])) ; • "<*alipiga>" "piga_kinanda" V 1/2-SG3-SP VFIN { he/she } PAST SVO ACT IDIOM-V "<kinanda>" { play piano }

  28. Types of MWEs • Nouns with genitive structure: • - number of forms limited, often sg and pl • suala la jinsia • masuala ya jinsia • REPLACE (<<MW { :gender issue }) TARGET ("jinsia") • (-2 ("suala")) (-1 GEN-CON); • "<suala>" "suala_la_jinsia" N 5/6-SG { the } AR MW-N "<la>" "<jinsia>" { :gender issue } • "<masuala>" "suala_la_jinsia" N 5/6-PL { the } AR MW-N "<ya>" "<jinsia>" { :gender issue }

  29. Types of MWEs • Adjectival expressions with relative structure: • - number of forms limited by the number of noun classes • mtu mwenye akili • REPLACE (ADJ <MW { clever , cute }) TARGET ("akili") • (-1 ("enye")) (NOT 0 MW); • "<mtu>" "mtu" N 1/2-SG { the } { man } • "<mwenye>" "enye_akili" MW> "<akili>" ADJ { clever , cute }

  30. Types of MWEs • Adjectival expressions with relative structure: • - number of forms limited by the number of noun classes • - is often embedded in the verb structure • tendo lililohitimishwa vibaya • REPLACE (ADJ <MW { illegitimate }) TARGET ("vibaya") • (-1 ("hitimishwa") + REL) (NOT 0 MW); • "<tendo>" "tendo" N 5/6-SG { the } { act } • "<lililohitimishwa>" "hitimishwa_vibaya" MW> "<vibaya>" ADJ { illegitimate }

  31. Types of MWEs • Adverbial expressions with genitive structure: • - number of forms limited • kwa bahati mbaya • REPLACE ( ADV <<MW { unfortunately } ) TARGET ("baya") • (-2 ("kwa")) (-1 ("bahati")) ; • "<kwa>" "kwa_bahati_baya" MW>> "<bahati>" "<mbaya>" ADV { unfortunately }

  32. Types of MWEs • Proper names with several members: • - fixed form • Wizara ya Mawasiliano na Uchukuzi • REPLACE (<<<<MW { *ministry of *communication et *transport }) TARGET ("uchukuzi") • (-4 ("wizara")) (-3 ("ya")) (-2 ("mawasiliano")) (-1 ("na")) ; • "<*wizara>" "wizara_ya_mawasiliano_na_uchukuzi" N 9/10-SG { the } AR MW-N "<ya>" "<*mawasiliano>" "<na>" "<*uchukuzi>" { *ministry of *communication et *transport }

  33. Types of MWEs • Proverbs: • - ‘fixed’ form • - one rule for different variants • Baada ya dhiki faragha. • Baada ya dhiki faraja. • Baada ya dhiki faraji. • REPLACE (<<PROVERB { *after trouble there is relief } ) TARGET ("faragha") OR ("faraja") OR ("faraji") • (-2 ("baada_ya")) (-1 ("dhiki")) ;

  34. Types of MWEs • Proverbs: • - ‘fixed’ form • "*baada_ya_dhiki_faragha" PROVERB>> { *after trouble there is relief } • "*baada_ya_dhiki_faraja" PROVERB>> { *after trouble there is relief } • "*baada_ya_dhiki_faraji" PROVERB>> { *after trouble there is relief }

  35. MWEs in dictionary compilation • MWEs as separate dictionary entries • {tia} V [tia] { put into, pour into, bring about, cause } 296 • {tia_akili} V IDIOM-V { take note of } 1 • [akili] taz. [tia_akili] V IDIOM-V { take note of } 1

  36. MWEs in dictionary compilation • MWEs as separate dictionary entries • {afya} N 9/10 { health, sound condition } AR 1226 • [afya]a taz. [bwana_afya] MW> N 9/6 { health officer } 10 • [afya]a taz. [enye_afya] MW> ADJ { bonny } 17 • [afya]a taz. [enye_nguvu_na_afya] MW>>> ADJ { hale } 1

  37. MWEs in dictionary compilation • MWEs with use examples: • {piga} V (piga) { hit, beat } 647 • {piga picha} V IDIOM-V { photograph } 40 • [piga picha] <ALA> Ikulu kunywa chai na kupiga [piga picha] picha na Rais Mkapa (the State House to drink tea and to photograph and President Mkapa) • [piga picha] <ALA> wapige [piga picha] picha, alionekana kugoma (they should photograph, he/she was seen to boycott) • [piga picha] <DWE> Au kumpiga [piga picha] picha au hata kupeana naye (Or to photograph or even to give each other with him/her) • [piga picha] <DWE> kutoka Ujerumani, walijitahidi kupiga [piga picha] picha za ukumbusho na kiongozi wao (from Germany, they made an effort to photograph the commemoration and their leader)

  38. MWEs in dictionary compilation • MWEs with use examples: • {piga ramli} V IDIOM-V { divine } 4 • [piga ramli] <KIO> anakwenda kwa mganga ili kupiga [piga ramli] ramli na kuongeza imani za ushirikina (he/she goes to the medical person in order to divine and to increase the faith in superstition) • [piga ramli] <KIO> ikambidi amtume mtaalam wa kupiga [piga ramli] ramli kuhusu nyota hiyo (he/she was obliged to send to him/her the expert of divining concerning this star) • [piga ramli] <KIO> kwenda kwa mganga wa kupiga [piga ramli] ramli, hujui kuwa imani ya (going to the medical person of divining, you do not know that the faith of) • [piga ramli] <RAI> kuachana na mtindo wa kupiga [piga ramli] ramli (to leave with the style of divining)

  39. Conclusion • Detailed description of MWEs necessary at least in two applications • - machine translation • - automatic dictionary compilation

  40. Conclusion • Improvements needed for CG parser • - possibility for ordering replace rules • - more possibilities for controlling the deletion and/or replacement of morphemes

More Related