340 likes | 478 Views
INFORMATICA UMANISTICA D: LESSICOGRAFIA & COMPUTER. Dizionari elettronici WordNet. Dizionari elettronici. Strumenti informatici usati non piu’ solo per realizzare dizionari cartacei, ma per sviluppare nuovi tipi di dizionari che consentono nuove forme di ricerca.
E N D
INFORMATICA UMANISTICA D: LESSICOGRAFIA & COMPUTER Dizionari elettronici WordNet
Dizionari elettronici Strumenti informatici usati non piu’ solo per realizzare dizionari cartacei, ma per sviluppare nuovi tipi di dizionari che consentono nuove forme di ricerca
DIZIONARI PER L’INGLESE IN FORMA ELETTRONICA • Oxford English Dictionary, seconda edizione • Oxford Talking Dictionary • Concise Oxford Dictionary • Learner dictionaries: • Longman Dictionary of Contemporary English (LDOCE) • Collins COBUILD English Dictionary
CONCISE OXFORD DICTIONARY • RICERCA: • Headword search (con *) • Hypertext search • Full text search (also of phrases / groups) • FILTRI: • etymology, phrasal verbs, suffixes
COLLINS: COBUILD • Disponibile da: • http://www.biblio.unitn.it/BancheDati/BancheDati.asp
DIZIONARI ELETTRONICI PER L’ITALIANO • Il VELI • Zanichelli: CD-ROM Multilingue, Scaffale Elettronico • Devoto-Oli • Garzanti: IPA `parla’
ESEMPIO: DEVOTO-OLI • Ricerca normale • Forme di citazione (incrementale) • Hyperlinks • Definizione / declinazione • Sinonimi / contrari • Ricerca avanzata • No: pronuncia; citazioni? • Limitato: storico
MRDS • Distinzione importante: • Dizionari consultabili elettronicamente • Dizionari MACHINE READABLE • Dizionari MACHINE TRACTABLE • Particolarmente utili: dizionari creati per EFL: • LDOCE • COBUILD • Progetto piu’ ambizioso: ODE in XML
ESEMPIO: ODE su CD-ROM (in XML) Esempio di database lessicografico in XML (= estremamente machine tractable)
ODE IN XML: FORMATO DELLE ENTRIES <se> <cn>815750</cn> - <hg> <hw>stock</hw> </hg> <s1> <ps>noun</ps> - <s2 num="1"> - <df>the goods or merchandise kept on the premises of a shop or warehouse and available for sale or distribution:</df> <ex>the store has a very low turnover of stock</ex> | </S2> <S2 num=“2”> …… </S2> </S1> <s1> <ps>adjective</ps> …..
ODE IN XML: INFORMAZIONI NLP -<nlp> <sup>merchandise</sup> <ss>Commerce</ss> - <morph id="01"> - <mu sy="NN"> <inf>stock</inf> <ph>stQk</ph> </mu> + <mu sy="NNS"> <ph>stQks</ph> </mu> </morph> </nlp>
ELDIT • (Elektronisches Lern(er)wörterbuch Deutsch-Italienisch – Dizionario elettronico per apprendenti italiano-tedesco ) • Un esempio di dizionario • Per apprendimento • Nato in forma elettronica • Lezione su ELDIT: il 14/5
EAT-LEX-1 SEMANTICA & LESSICO: UN RIASSUNTO “eat” “eats” eat0600 eat0700 “ate” “eaten” WORD-FORMS LEXEMES SENSES
STOCK-LEX-1 STOCK-LEX-2 STOCK-LEX-3 L’ORGANIZZAZIONE DEL LESSICO stock0100 stock0200 stock0600 “stock” stock0700 stock0900 stock1000 WORD-FORMS LEXEMES SENSES
CHEAP-LEX-1 CHEAP-LEX-2 INEXP-LEX-3 SINONIMIA cheap0100 “cheap” …. …… cheapXXXX inexp0900 “inexpensive” inexpYYYY WORD-FORMS LEXEMES SENSES
WORDNET • A lexical database created at Princeton • Freely available for research from the Princeton site • http://www.cogsci.princeton.edu/~wn/ • Information about a variety of SEMANTICAL RELATIONS • Three sub-databases (supported by psychological research as early as (Fillenbaum and Jones, 1965)) • NOUNs • VERBS • ADJECTIVES and ADVERBS • Each database organized around SYNSETS
SYNSETS • Senses (or `lexicalized concepts’) are represented in WordNet by the set of words that can be used in AT LEAST ONE CONTEXT to express that sense / lexicalized concept: the SYNSET • E.g., {chump, fish, fool, gull, mark, patsy, fall guy, sucker, shlemiel, soft touch, mug}(gloss: person who is gullible and easy to take advantage of)
IL DATABASE DEI NOMI • About 90,000 forms, 116,000 senses • Relations:
IPERNIMIA 2 senses of robin Sense 1robin, redbreast, robin redbreast, Old World robin, Erithacus rubecola -- (small Old World songbird with a reddish breast) => thrush -- (songbirds characteristically having brownish upper plumage with a spotted breast) => oscine, oscine bird -- (passerine bird having specialized vocal apparatus) => passerine, passeriform bird -- (perching birds mostly small and living near the ground with feet having 4 toes arranged to allow for gripping the perch; most are songbirds; hatchlings are helpless) => bird -- (warm-blooded egg-laying vertebrates characterized by feathers and forelimbs modified as wings) => vertebrate, craniate -- (animals having a bony or cartilaginous skeleton with a segmented spinal column and a large brain enclosed in a skull or cranium) => chordate -- (any animal of the phylum Chordata having a notochord or spinal column) => animal, animate being, beast, brute, creature, fauna -- (a living organism characterized by voluntary movement) => organism, being -- (a living thing that has (or can develop) the ability to act or function independently) => living thing, animate thing -- (a living (or once living) entity) => object, physical object -- => entity, physical thing --
MERONIMIA wn beak –holon Holonyms of noun beak 1 of 3 senses of beak Sense 2 beak, bill, neb, nib PART OF: bird
VERBI • About 10,000 forms, 20,000 senses • Relations between verb meanings:
RELAZIONI TRA SIGNIFICATI VERBALI V1 ENTAILS V2 when Someone V1 (logically) entails Someone V2- e.g., snore entails sleep TROPONYMY when To do V1 is To do V2 in some manner- e.g., limp is a troponym of walk
AGGETTIVI & AVVERBI • About 20,000 adjective forms, 30,000 senses • 4,000 adverbs, 5600 senses • Relations:
COME USARLO • Online: http://cogsci.princeton.edu/cgi-bin/webwn • Scaricatevelo, poi da command line: • Get synonyms: • wn –synsn bank • Get hypernyms: • wn –hypen robin • (also for adjectives and verbs): get antonyms • wn –antsa right
I LIMITI DI WORDNET • Coverage • words not in WordNet • Crocidolite, spinoff (spin-off) • Missing information: MERONYMY • Context-dependent senses: • slump, crash, bust all synonyms in the WSJ corpus • The structure of WordNet • Some information is encoded in complex ways (room, wall, floor) • But: MOVING TARGET!!
MERONIMIA IN WORDNET: UN ESPERIMENTO • 100 bridging descriptions in a mereological relation • Ran a script trying to find a direct link in WordNet (1.7) between one of the senses of the BD and one of the senses of any of the previous NPs • Results: in only 6 cases there is in WordNet a direct lexical relation between a BD and one of the CFs
ARTIFACT IS-A IS-A HOUSING BUILDING IS-A IS-A PART-OF HOUSE HOME ROOM PART-OF PART-OF WALL FLOOR John looked at the HOUSE. The WALL was crumbling.
SOLUZIONE: ACQUISIZIONE LESSICALE • Parziale (aggiungi informazioni a WordNet, specialmente per domini specialistici) • Totale (crei un nuovo lessico a partire da zero)
LETTURE • Jackson, cap. 6.7 • Marello, cap. 5.5 • C. Fellbaum. WordNet: An electronic lexical database. MIT Press, 1998 • cap. 1