150 likes | 244 Views
Nicoletta Calzolari ILC - CNR - Pisa, Italy. Language Resources & Semantic Web. To make the Semantic Web a reality . …need to tackle the twofold challenge of content availability and multilinguality Natural convergence with HLT: multilingual semantic processing ontologies
E N D
Nicoletta CalzolariILC - CNR - Pisa, Italy Language Resources & Semantic Web COLING Workshop - 2002
To make the Semantic Web a reality ... …need to tackle the twofold challenge of • content availability and • multilinguality Natural convergence with HLT: • multilingual semantic processing • ontologies • semantic-syntactic computational lexicons COLING Workshop - 2002
Computational Multilingual Lexicons: an essential component for the Semantic Web • Language - & lexicons - are the gateway to knowledge • Semantic Web developers need repositories of words & terms - & knowledge of their relations in language use & ontological classification. • The cost of adding this structured and machine-understandable lexical informationcan be one of the factors that delays its full deployment. • The effort of making available millions of ‘words’ for dozens of languages is something that no small group is able to afford. • A radical shift in the lexical paradigm - whereby many participants add linguistic content descriptions in an open distributed lexical framework - is required to make the Web usable COLING Workshop - 2002
Infrastructure of Language Resources... ...static • Semantic network: Euro-/ItalWordNet • Lexicons: PAROLE/SIMPLE/CLIPS • TreeBank +sw International Standards But … they will never be “complete” …dynamic • Lexical acquisitionsystems (syntactic & semantic) from text corpora • Robust systems of morphosyntactic & syntactic analysis • Word-sensedisambiguation systems COLING Workshop - 2002
Italian Semantic Network Italian module of EuroWordNet (http://www.hum.uva.nl/~ewn/) • ~50.000 lemmas organized in synonym groups (synsets), structured in hierarchies & linked by ~130.000 semantic relations • ~50.000 hyperonymy/hyponymy relations • ~ 16.000 relations among different POS (role, cause, derivation, etc..) • ~ 2.000 part-whole relations • ~ 1.500 antonymy relations, …etc. • Synsets linked to the InterLingual Index (ILI=Princeton WordNet), • Through the ILI link to all the European WordNets (de-facto standard) • & to the common Top Ontology • Possibility of plug-in with domain terminological lexicons • Usable in IR, CLIR, IE, QA, ... COLING Workshop - 2002
Domain - Semantic class mangiare COLING Workshop - 2002
+edible Used_for Object_of_the_activity TELIC Is_the_activity_of AGENTIVE Created_by Domain - Semantic class zucchero mangiare NATURAL_SUBSTANCE alloro FLAVOURING tartufo cucinare cuocere VEGETAL_ENTITY friggere mestolo mangiare cucinare mangiare mangiare mangiare mangiare mangiare cucinarecuocerearrostirebollirelessarestufarefriggere rosolaregrigliare…… bollire mangiare pentola mangiare friggitrice carne tavola forchetta ristorante mela posata BUILDING carota cuoco coniglio FURNITURE bollitore FOOD pesce FRUIT arrosto VEGETABLES pesciera SUBSTANCE_FOOD INSTRUMENT CONTAINER PROFESSION ARTIFACT _FOOD COLING Workshop - 2002
machine language learning COLING Workshop - 2002
machine language learning linguistic learning development of conceptual networks linguistic change models language usage models adaptive classification systems information extraction bootstrapping of lexical information bootstrapping of grammars COLING Workshop - 2002
Beyond MILE:towards open & distributed lexicons Ontology URI = http://www.zzz… Semantic Lexicon URI = http://www.xxx… Syntactic Constructions URI = http://www.yyy… Lex_object: semFeature URI = http://www.xxx…#HUMAN Lex_object: syntagmaNT URI = http://www.zzz…#NP Monolingual/Multilingual Lexicon COLING Workshop - 2002
Target…..Multilingual Knowledge ManagementTechnical Feasibility: • Prerequisite: is it an achievable goal a commonly agreedtext/lexicon annotation protocol also for the semantic/conceptual level (to be able to automatically establish links among different languages)? Yes, at thelexical level More complex, for corpus annotation? EAGLES/ISLE COLING Workshop - 2002
A few Issues for discussion:lexicon standards • Semantic Web standards and the needs of content processing technologies: • importance of reaching consensus on (linguistic and non-linguistic) “content”, in addition to agreement on formats and encoding issues (…words convey content & knowledge) • short/medium term requirements wrt standards for multilingual lexicons & content encoding, also industrial requirements • Relation with Spoken language community • MILE & Asian languages: how to cooperate concretely? • Define further steps necessary to converge on common priorities • …. COLING Workshop - 2002
A few Issues for discussion:“content”, priorities... • For which type of resources to invest? wrt short vs. medium term results? • Need for robust systems, able to acquire/tune lexical/linguistic (also multilingual) knowledge, to auto-enrich static basic resources? • What the relation betw. lexical standards and text annotation protocols? • Knowledge management is critical. For “content” interoperability, is the field ‘mature’ enough to converge around agreed standards also for the semantic/conceptual level (e.g. to automatically establish links among different languages)? • Is the field of multilingual lexical resources ready to tackle the challenges set by the Semantic Web development? Towards a new paradigm?? COLING Workshop - 2002
A new paradigm for LR? Where the focus is on cooperation New Strategic Vision? towards a Distributed Open Lexical Infrastructure? • for distributed & cooperative creation, management, etc. of Lexical Resources • technical & organisational requirements COLING Workshop - 2002
Language Resources & Semantic Web “ELITE” (expression of interest for the 6thFP)“European Lexical Infrastructure and Technology” New proposed paradigm for lexicon development: Open & Distributed Lexical Infrastructure for content description and content interoperability, to make lexical resources usable within the emerging Semantic Web scenario COLING Workshop - 2002