1 / 22

Merkingarbrunnur fyrir íslenska máltækni

Merkingarbrunnur fyrir íslenska máltækni. Matthew Whelpton og Anna Björk Nikulásdóttir. Semantics in IT and NLP. Semantic information is an essential resource for Natural Language Processing applications, especially for: information extraction question answering machine translation

rali
Download Presentation

Merkingarbrunnur fyrir íslenska máltækni

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Merkingarbrunnur fyrir íslenska máltækni Matthew Whelpton og Anna Björk Nikulásdóttir Hugvísindaþing 2009

  2. Semantics in IT and NLP • Semantic information is an essential resource for Natural Language Processing applications, especially for: • information extraction • question answering • machine translation • integrated spoken language systems Hugvísindaþing 2009

  3. First steps for Icelandic • Three examples of semantic resources for English • WordNet • FrameNet • PropBank • Overview of the first project aiming to develop a comparable resource for Icelandic • WordNet-like semantic network • RANNÍS Hugvísindaþing 2009

  4. WordNet • WordNet is • an electronic database • of (English) words • and the semantic relations between them Hugvísindaþing 2009

  5. Hyponymy • “fox” is_a_kind_of “animal” • every fox is an animal • NOT every animal is a fox • “fox” is a hyponym of “animal” • hypo ‘under’ • nym ‘name’ • “animal” is a hypernym of “fox” • hyper ‘over’ • nym ‘name’ Hugvísindaþing 2009

  6. Two simple uses • Helps with semantically sensitive searches • find animals that eat eggs • WordNet tells us that “fox” as an “animal” meets the first criterion of this search • Helps with reference tracking • The farmer saw a fox. The animal was hurt. • => the fox was hurt Hugvísindaþing 2009

  7. Synonymy • At the heart of WordNet is the notion of a synset • a synset = a set of synonyms • synonym • having the same meaning • syn ‘same’; nym ‘name’ • in WordNet words count as synonyms if they can be substituted for each other in some contexts Hugvísindaþing 2009

  8. Synsets • WordNet lists “animal” as synonymous with “creature” and “beast”. • The farmer saw a fox. The animal/creature/beast was hurt. • => the fox was hurt • WordNet lists “farmer” as a hyponym of “person” not “animal”. • “person” and “animal” are hyponyms of “being” Hugvísindaþing 2009

  9. Fragment of a WordNet being person {animal; creature; beast} farmer fox Hugvísindaþing 2009

  10. FrameNet • WordNet focuses on individual words and semantic relations between them. • FrameNet focuses on generalised situations (frames) and the entities and activities which characterise those situations (frame elements) Hugvísindaþing 2009

  11. Apply_heat Frame • Frame • Apply_heat • Frame Elements • Cook • Food • Heating_Instrument • Example • [Cook Matilde] [Apply_heat fried] [Food the catfish] [Heating_instrument in a heavy iron skillet]. Hugvísindaþing 2009

  12. Form and Meaning • WordNet • Individual words • Semantic relations between them • FrameNet • General situations (word knowledge) • How words characterise those situations • PropBank • Relation between • Syntactic Structure • Predicate Argument Structure • U.Penn. TreeBank + predicate argument information Hugvísindaþing 2009

  13. Syntactic and Argument Structure • What do you like? Treebank annotation: (SBARQ (WHNP-1 what) (SQ do (NP-SBJ you) (VP like (NP *T*-1)))) Propbank annotation: Rel: like Arg0: you Arg1: [*T*] -> What Hugvísindaþing 2009

  14. The Next Step (RANNÍS) • All of the resources mentioned were manually created • extremely labour-intensive • unrealistic for Icelandic • Search for automated methods • Anna Nikulásdóttir has already begun such work on the creation of a WordNet-like semantic database for Icelandic and this will be the focus of the new RANNÍS project. Hugvísindaþing 2009

  15. Greining merkingarvensla úr ÍO • Níu mismunandi merkingarvensl greind sjálfvirkt úr Íslenskri orðabók (ÍO) • Notast við orðflokka- og blönduð orða- og orðflokkamynstur: l_n  yfirheiti(fletta, n) eiginleiki(fletta, l) það að_s  tengtSo(fletta, s) Hugvísindaþing 2009

  16. Greining merkingarvensla úr ÍO fuglsblundur örstuttur (lo.) svefn (no.) yfirheiti(fuglsblundur, svefn) eiginleiki(fuglsblundur, örstuttur) mæling það að mæla (so.) tengtSo(mæling, mæla) Hugvísindaþing 2009

  17. Greining merkingarvensla úr ÍO Hugvísindaþing 2009

  18. Greining merkingarvensla úr ÍO • Niðurstöður: • 77.348 skýringar eða 96,45% allra nafnorðaskýringa í ÍO greindar • Prófunarsett: 94,77% skýringa án rangrar greiningar Hugvísindaþing 2009

  19. Merkingarnám úr textum • Merkingarnám úr textum þarfnast gífurlegs magns markaðra texta • Mörkuð íslensk málheild mun hér koma að miklum notum • Stefnt er að því að finna mynstur í íslensku sambærileg svokölluðum Hearst-mynstrum í ensku Hugvísindaþing 2009

  20. Hearst-mynstur • NP0 such as NP1, NP2, ..., NPn-1 (and|or) NPn ... red algae, such as Gelidium, ... • such NP0 as NP1, NP2, ..., NPn-1 (and|or) NPn ... works by such authors as Herrick, Goldsmith, and Shakespeare • NP1, NP2, ..., NPn (and|or) other NP0 Bruises, wounds, broken bones or other injuries ... ... temples, treasuries, and other important civic buildings Hugvísindaþing 2009

  21. Möguleg mynstur í íslensku? • NP0, utan NP1, NP2, ..., NPn-1 og NPn ... heimilisstörf, utan eldamennsku ... • NP1, NP2, ..., NPn eða (aðra|annan|annað|önnur) NP0 ... Diesel-gallabuxur eða aðra merkjavöru • NP0 eins og NP1, NP2, ..., NPn-1 (og|eða) NPn ... grunngildum eins og góðum samskiptum ... Hugvísindaþing 2009

  22. Merkingarbrunnur? • Stefnt er að því að tengja niðurstöður innbyrðis • Þróað verður tól með grafísku viðmóti til þess að leiðrétta og bæta við niðurstöður sjálfvirku greiningarinnar • Möguleikar á tengingu við WordNet verða kannaðir Hugvísindaþing 2009

More Related