330 likes | 443 Views
An Ontology-Based Knowledge Portal for Language Technology. Hans Uszkoreit, Brigitte J örg, Gregor Erbach. Project COLLATE.
E N D
An Ontology-Based Knowledge Portalfor Language Technology Hans Uszkoreit, Brigitte Jörg, Gregor Erbach
Project COLLATE Theme:Computational Linguistics and Language Technology for Real World ApplicationsPartners: DFKI Saarbrücken, Saarland UniversitySupport:A Grant by the German Federal Ministry for Education and Research for RTD strengthening the position of Saarbrückenas a Competence Center for Language TechnologyPIs: Hans Uszkoreit, Manfred Pinkal and Wolfgang WahlsterDuration: Spring 2001 - end of 2003
Information Center: LT World • Information Service about Language Technology • www.lt-world.org • Ontology-based • XML Import and Export Formats • Visual and Structural Design
Objectives • distributed information service • combines and offers for each aspect of LT the best contents available • exploits hypermedia technology for including useful contents • is flexible and scalable enough to support the evolution of the discipline • exhibits a structure that is transparent for both experts and visitors from outside the field • increasingly utilizes language and knowledge technologies for improved management and presentation of the information. • is open for exchange of data with other information services • potential for interoperability with future knowledge services • is suited for the sophisticated metadata schemes of the envisaged semantic web
LT World - Levels and Tasks Conceptual Level Specification Level Technical RealizationLevel Content Level underlying logicalstructure ontology specifications concrete architecture selection of sources organization of collection/production data maintenancestructure XML specifications DBs, XML pages,HTML pages content in DBs,documents, links presentationalstructure generic designCI actual designof pages presented contents
User View: Four Top Level Areas • Information and Knowledge • Players and Teams • Resources and Results • Communication/Interaction
Information and Knowledge • Basic knowledge about all areas of LT source: Survey of the State of the Art in Human Language Technology (1997, new edition in preparation) • Pointers to specialized knowledge (links to literature, projects, systems, products, people, resources, standards...) source: link collection by DFKI • Glossary of the fieldsource: DFKI with input from HLT Survey
Players and Teams • DB with all researchers in LTnames, affiliations, links to homepagesnumber of entries: 2235 • DB of projectsnumber of entries: 659 • DB of research organisations, companies, funding agenciesnumber of entries: 1561
Resources and Results • DB of prototypes, research systems and productssource: ACL Software Registry (operated by DFKI) • Links to resource initiatives: ELRA, LDC, • For resources link to search service of OLAC
Communication/Interaction • News about technologies, people, products, centers, etc.source: collection by DFKI and contributions by usersnumber of entries: 370 • List of Events: Conferences, Workshops, Summer Schools,etc.source: collection by DFKI and contributions by usersnumber of entries: 251 • Links Topic-Centered Mailing Listssource: collection of existing lists
Systematics of the Discipline • Mature scientific or engineering disciplines have developed a systematics of the subject • Younger disciplines have outgrown their first systematics • LT or CL does not yet have a systematics or a classification scheme
Logical Structuring: Two Options Tree-Structured Classification • Libraries • Encyclopedias and Handbooks Multidimensional Structuring • Multiple-Inheritance Hierarchies • And-Or Hierarchies
Means for Ordering • Terminology • Thesaurus • Classification vs. Systematics • Taxonomy = Classification + Nomenclature • Ontology • formal ontology • relational ontology
Our Setup • Immediately visible structure: easy and transparent • Some multidimensional structuring through chapter structure of the Survey • For internal storage and DB search: complex multidimensional structure • Underlying systematics: multilayered and multidimensional ontology
Ontologies • Theoretical Ontologies • Epistemological reasons • Phenomenological systematics • Practical Ontologies • Support of processes • Data Maintenance • Information Services
Systematics/Ontologies • Generic Core: Dublin Core • Special Ontologies underlying exchange formats for special information types such as • OLAC (for linguistic resources) • BibTex (for scientific literature) • Languages (for language codes) • Generic ontologies for the scientific discipline and technology sector • General Multidimensional Classification for CL and LT
Applied Science Actor Subject NewKnowledge Means Applications Applied Research Actors Subject ResearchGoals Methods Applications Applied ResearchProject Actors Subject ResearchGoals Methods Duration Applications • Science • Actor • Subject • NewKnowledge • (Scientific)Means • Research • Actors • Subject • ResearchGoals • Means • ResearchProject • Actors • Subject • ResearchGoals • Means • Duration
Funded Research Project • Name • Acronym • Full Name • Actors • Organizations • PI • Other Roles • Researchers • Subject • Discipline/Area • Objectives • Goals • Means • Program • Duration • StartDate • EndDate • Funding • Agency • Program • Funding Number
Education Science Search ExtraScientific Purpose Production ExtraScientific Purpose Scientific Education Research Technical Product Applied Research Technology
Multidimensional Classification for CL and LT Dimensions Generic: Type of Resource (web page, metaindex, publication, person, product, patent, project, ...) People Geolocation Date/Comments Disciplin--Specific (not all may apply for a given resource) Application (grammar checking, text translation, IR) Linguality (monolingual, bilingual,multilingual, translingual, language-inde) Languages/Language Pairs (Romanian, Thai, <en-fr>,...) Technologies (HMM, FSA, EBT, linear programming, ...) Linguistic Area (morphology, syntax, pragmatics,...) Linguistic Approach (Two-Level Morpology, systemic functional g., DRT)
Excerpt from the Ontology Technology Dublin Core Language Technology Languages OLAC BibTex LT World Communication& Events Teams & Players Systems & Resources Information & Knowledge Publications
Area Nodes Example of the shallow hierarchy for technologies • Text Technologies ... • Text Summarization... • Information Extraction • Named Entity Recognition • Terminology Extraction • Relation Extraction • Answer Extraction... • Text Generation...
Main Info for Each Subject Area • Name • Acronyms • aka‘s, Term Translations • Short Definition • Explanation • Topic Websites • R&D Prototypes/Products • Projects • People • Literature
Ontology Modelling and Interchange Formats • Ontologies maintained with Protégé 2000 • Ontology Modelling with Protégé • Export / Interchange Formats
Protégé: Slot View Protégé: Slot View
Protégé: Form View (Input-Configuration) Protégé: Form View
Protégé: Instance View (Input-Interface) Protégé: Instance View
Protégé: RDF-Export Instance of the Babel system Protégé: RDF-Export <LT:System rdf:about="<LT_00398" LT:applications="Structure Building" LT:dc.coverage="66123 Saarbruecken" LT:dc.identifier="http://www.dfki.de/~stefan/Babel/e_babel.html" LT:lt.linguality="monolingual" LT:lt.linguistic_approach="HPSG" LT:lt.linguistic_area="syntax" LT:olac.type.functionality="Written Language" LT:olac.type.linguistic="HPSG" LT:resource.contact="Stefan.Mueller@dfki.de" LT:resource.homepage_url="http://www.dfki.de/~stefan/Babel/e_babel.html" LT:resource.name="Babel" LT:resource.type="system" LT:technological_method="Written Language" LT:type="system" rdfs:label="Babel"> <LT:resource.description>Babel is a Prolog System with Web-Interface in Perl and Java. Its main purpose is the test of an HPSG grammar for German.</LT:resource.description> <LT:dc.language rdf:resource="<English"/> <LT:lt.languages rdf:resource="<German"/> <LT:dc.creator rdf:resource="<LT_00399"/> <LT:developed-by rdf:resource="<LT_00399"/> <LT:dc.rights rdf:resource="<ont_051002_00178"/> <LT:developed-by rdf:resource="<ont_051002_00209"/> <LT:olac.format.os>Windows 95</LT:olac.format.os> <LT:olac.format.os>Windows NT</LT:olac.format.os> </LT:System> Attributes Relations
Protégé: RDF-Export Instance of the Babel system Protégé: RDF-Export <LT:System rdf:about="<LT_00398" LT:applications="Structure Building" LT:dc.coverage="66123 Saarbruecken" LT:dc.identifier="http://www.dfki.de/~stefan/Babel/e_babel.html" LT:lt.linguality="monolingual" LT:lt.linguistic_approach="HPSG" LT:lt.linguistic_area="syntax" LT:olac.type.functionality="Written Language" LT:olac.type.linguistic="HPSG" LT:resource.contact="Stefan.Mueller@dfki.de" LT:resource.homepage_url="http://www.dfki.de/~stefan/Babel/e_babel.html" LT:resource.name="Babel" LT:resource.type="system" LT:technological_method="Written Language" LT:type="system" rdfs:label="Babel"> <LT:resource.description>Babel is a Prolog System with Web-Interface in Perl and Java. Its main purpose is the test of an HPSG grammar for German.</LT:resource.description> <LT:dc.language rdf:resource="<English"/> <LT:lt.languages rdf:resource="<German"/> <LT:dc.creator rdf:resource="<LT_00399"/> <LT:developed-by rdf:resource="<LT_00399"/> <LT:dc.rights rdf:resource="<ont_051002_00178"/> <LT:developed-by rdf:resource="<ont_051002_00209"/> <LT:olac.format.os>Windows 95</LT:olac.format.os> <LT:olac.format.os>Windows NT</LT:olac.format.os> </LT:System> Relations
Organizational Issues Division of Labour • In the beginning all contents and references were collected and maintained by DFKI • Input of the authors/ area specialists of the Survey for distributed authoring and content maintenance • Input from the LT community via HTML forms and XML import format • News and conferences maintained and updated by DFKI
Relationships to External Resources • Included but autonomous resources: ACL NL Software Registry, Language Technology Survey • Systematically cross-Linked and Cross-Searchable Resources: all OLAC Resources such as (LDC, SIL, ACL SR, and OLAC Home) • Systematically crosslinked resources: HLT Central, ELSNET, EACL ACL NLP Universe • Linked resources: All other relevant resources relevant for LT