390 likes | 554 Views
ArchiWordNet Integrating WordNet with Domain-Specific Knowledge Luisa Bentivogli 1 , Andrea Bocco 2 , Emanuele Pianta 1 1 ITC-irst Trento, Italy 2 Politecnico di Torino, Italy. Outline. ArchiWordNet: a WordNet-like thesaurus Adopting and adapting the MultiWordNet model
E N D
ArchiWordNet Integrating WordNet with Domain-Specific Knowledge Luisa Bentivogli1, Andrea Bocco2, Emanuele Pianta1 1ITC-irst Trento, Italy 2Politecnico di Torino, Italy
Outline • ArchiWordNet: a WordNet-like thesaurus • Adopting and adapting the MultiWordNet model • Integrating ArchiWordNet with MultiWordNet • Conclusion and future work GWC 2004 - Brno, January 20-23, 2004
Outline • ArchiWordNet: a WordNet-like thesaurus • Adopting and adapting the MultiWordNet model • Integrating ArchiWordNet with MultiWordNet • Conclusion and future work GWC 2004 - Brno, January 20-23, 2004
ArchiWordNet: a WordNet-like thesaurus • A bilingual English/Italian thesaurus for the “Architecture and Construction” domain • structured according to the WordNet model • fully integrated with MultiWordNet MultiWordNet A multilingual lexical database in which the Italian WordNet is strictly aligned with Princeton’s English WordNet. GWC 2004 - Brno, January 20-23, 2004
Motivation • Still Image Server, an architecture image archive available at the Polytechnic of Turin • need for a thesaurus: • Image cataloguing (minimize subjectivity) • Image retrieval(minimize ambiguity) • No exhaustive thesauri for the architecture domain are available GWC 2004 - Brno, January 20-23, 2004
Why (Multi)WordNet model? • A rich and rigorous structure • synonyms • many relations explicitly and homogeneously encoded • Allows for a more powerful and expressive retrieval mechanism • no ambiguities • extended search with related concepts • Is more suitable for educational purposes GWC 2004 - Brno, January 20-23, 2004
Why integrated with MultiWN? • General and multilingual framework for the specialized knowledge • Integrated access allowing for a more flexible retrieval of the information • Information already existing in the generic (Multi)WordNet can be exploited in the creation of the specialized one GWC 2004 - Brno, January 20-23, 2004
Outline • ArchiWordNet: a WordNet-like thesaurus • Adopting and adapting the MultiWordNet model • Integrating ArchiWordNet with MultiWordNet • Conclusion and future work GWC 2004 - Brno, January 20-23, 2004
Adopting MultiWN model • Sources: • Specialized sources • Art and Architecture Thesaurus (AAT) • Construction Indexing ManualofCI|SfB • International and National standards (ISO, CEN, UNI) • Architecture and Building Dictionaries • Domain literature • MultiWN itself • Issues: • Reorganize specialized sources to make them compatible with the MultiWN model • Modify MultiWN synsets to make them suitable for representing the specialized domain GWC 2004 - Brno, January 20-23, 2004
Reorganizing domain-specific sources AAT hierarchy ArchiWN hierarchy GWC 2004 - Brno, January 20-23, 2004
Tailoring MultiWN synsets • MultiWN synsets considered appropriate by the domain experts are included into ArchiWN • Several options are available: • add or delete synonyms to MultiWN synsets • modify MultiWN definitions of the synsets • delete and add relations between synsets GWC 2004 - Brno, January 20-23, 2004
New relations for ArchiWN • HAS FORM (n/n) • {tympanum} HAS-FORM {triangle, trigon, …} • HAS ROLE (n/n) • {metal section} HAS-ROLE {upright, vertical} • HAS FUNCTION (n/v) • {beam} HAS-FUNCTION {to hold, to support,…} GWC 2004 - Brno, January 20-23, 2004
Outline • ArchiWordNet: a WordNet-like thesaurus • Adopting and adapting the MultiWordNet model • Integrating ArchiWordNet with MultiWordNet • Conclusion and future work GWC 2004 - Brno, January 20-23, 2004
Integrating ArchiWN with MultiWN • 5,000 terms grouped in 13 semantic areas => the main ArchiWN hierarchies • Architectural styles • Materials • Construction products • Techniques • Tools • Components of buildings • Single buildings and building complexes • Physical properties • Conditions • Disciplines • People • Documents • Drawings and representations GWC 2004 - Brno, January 20-23, 2004
Integration issues • Identify the MultiWN nodes where to insert the ArchiWN hierarchies • Include ArchiWN hierarchies in MultiWN • Handle the overlaps between terms present in both MultiWN and ArchiWN • Handle the possible inconsistencies in the hierarchies GWC 2004 - Brno, January 20-23, 2004
The integration methodology • Basic operations • performed on single MultiWN synsets • Complex procedures (plug-in) • apply to entire hierarchies GWC 2004 - Brno, January 20-23, 2004
Basic operations • eclipse a synset • tag a synset with the “architecture and construction” domain label • add or delete relations to a synset • add or delete synonyms in a synset • modify the synset definition GWC 2004 - Brno, January 20-23, 2004
Complex procedures • Substitutive plug-in • Integrative plug-in • Hyponymic plug-in • Inverse plug-in GWC 2004 - Brno, January 20-23, 2004
Complex procedures MWN MWN MWN MWN MWN MWN • Substitutive plug-in • Integrative plug-in • Hyponymic plug-in • Inverse plug-in GWC 2004 - Brno, January 20-23, 2004
Complex procedures MWN AWN MWN AWN AWN AWN • Substitutive plug-in • Integrative plug-in • Hyponymic plug-in • Inverse plug-in GWC 2004 - Brno, January 20-23, 2004
Complex procedures MWN MWN MWN MWN MWN MWN • Substitutive plug-in • Integrative plug-in • Hyponymic plug-in • Inverse plug-in GWC 2004 - Brno, January 20-23, 2004
Complex procedures MWN AWN MWN AWN MWN AWN • Substitutive plug-in • Integrative plug-in • Hyponymic plug-in • Inverse plug-in GWC 2004 - Brno, January 20-23, 2004
Complex procedures MWN MWN MWN MWN MWN • Substitutive plug-in • Integrative plug-in • Hyponymic plug-in • Inverse plug-in GWC 2004 - Brno, January 20-23, 2004
Complex procedures MWN MWN AWN MWN MWN MWN AWN AWN • Substitutive plug-in • Integrative plug-in • Hyponymic plug-in • Inverse plug-in GWC 2004 - Brno, January 20-23, 2004
Complex procedures AWN AWN AWN AWN AWN • Substitutive plug-in • Integrative plug-in • Hyponymic plug-in • Inverse plug-in GWC 2004 - Brno, January 20-23, 2004
Complex procedures AWN AWN MWN AWN AWN AWN MWN MWN • Substitutive plug-in • Integrative plug-in • Hyponymic plug-in • Inverse plug-in GWC 2004 - Brno, January 20-23, 2004
Results • 13 ArchiWN semantic areas plugged in 18 MultiWN synsets • 11 ArchiWN semantic areas (12 hierarchies) directly plugged in MultiWN • 4 substitutive plug-ins • 8 integrative plug-ins • 2 ArchiWN semantic areas (6 hierarchies) required a reorganization of some MultiWN sub-hierarchies • 4 hyponymic plug-ins • 2 inverse plug-ins • large synset eclipsing GWC 2004 - Brno, January 20-23, 2004
ArchiWN up to now • “Single buildings and building complexes” sub-hierarchy • 900 synsets • Italian and English synonyms • accurate definition • Work done manually using the MultiWN graphical interface which allows the user • to modify existing synsets and relations • to create new synsets GWC 2004 - Brno, January 20-23, 2004
Outline • ArchiWordNet: a WordNet-like thesaurus • Adopting and adapting the MultiWordNet model • Integrating ArchiWordNet with MultiWordNet • Conclusion and future work GWC 2004 - Brno, January 20-23, 2004
Conclusions • It is possible to integrate ArchiWN with MultiWN • MultiWN itself can be widely exploited in the creation of ArchiWN hierarchies • Advantages of interdisciplinary cooperation • wrt specialized thesauri • formalized structure • inheritance of linguistic-oriented information from the generic WordNet • wrt lexical resources • many synsets will be associated with images GWC 2004 - Brno, January 20-23, 2004
Future work • Go on enriching the “Single buildings and building complexes” hierarchy and populating the remaining hierarchies • Industrial applications:multilingual specialized lexicon of approximately 1,000 synsets for the window and curtain wall industry • Agreement for the future usage of ArchiWN by the Piemonte region in the cataloguing of its architectural cultural heritage GWC 2004 - Brno, January 20-23, 2004
Details GWC 2004 - Brno, January 20-23, 2004
Direct plug-ins back GWC 2004 - Brno, January 20-23, 2004
Reorganizations back GWC 2004 - Brno, January 20-23, 2004
Term overlapping ITC-irst provides the Polythecnic with lists of terms: -synsets tagged with the “architecture” label in WN-Domains -hyponyms of WordNet plug-in synsets WN-Domains: 2,595 • Architecture = 155 synsets • Town planning = 444 synsets • Building industry = 1,541 synsets • Furniture = 455 synsets GWC 2004 - Brno, January 20-23, 2004
Hyponyms of Plug-in synsets back GWC 2004 - Brno, January 20-23, 2004
entity/1 object/1 artifact/1 part/4 location/1 structure/1 component/3 region/1 structure (AWN) architectural component architectural space building/1 building complex/1 building element room, area, building space open space Reorganization of: -Components of buildings -Single buildings and building complexes eclipsing hypo hypo hypo hypo inverse inverse GWC 2004 - Brno, January 20-23, 2004
Modifying MultiWN definition partition divider support ISA ISA wall structural_wall bearing_wall an architectural partition with a height and length greater than its thickness; used to divide or enclose an area any wall supporting a floor or the roof of a building WordNet: {wall – “an architectural partition with a height and length greater than its thickness; used to divide or enclose an area or to support another structure”} GWC 2004 - Brno, January 20-23, 2004