180 likes | 302 Views
User-friendly Ontology Authoring using a Controlled Language [1]. Brian Davis brian.davis@deri.org. DERI -Reading Group July 19, 2006. Contents. Introduction Overview. Problem / Motivation. Overview of Controlled Languages. CLIE – Controlled Language Information Extraction .
E N D
User-friendly Ontology Authoring using a Controlled Language [1] Brian Davis brian.davis@deri.org DERI -Reading Group July 19, 2006
Contents • Introduction Overview. • Problem / Motivation. • Overview of Controlled Languages. • CLIE – Controlled Language Information Extraction. • Round Trip Ontology Authoring – in brief • Conclusion • Future Work • References
Overview • SW and KM fields have produced many tools – variety of contexts and scenarios. • Tools require structuring of IS most commonly in the form of ontologies. • Growing trend to use NLP techniques in IE with respect to automatic extraction of semantic metadata from existing textual documents. (bootstrap layer of formal knowledge required SW tools). • Ontologies are also popular for representing domain information used internally by NLP systems . Source [1]
Motivation and Problem (1) • Authoring of ontological data – User friendly means are needed! • Ontology Tools available to KEs : • Example: Protégé [2] • Disadvantages: • Requires some training + understanding of underlying knowledge formalisms • Ontology Representation Languages i.e. OWL[3], RDF-S[4] Complex and very descriptive To Non-Experts -> difficult to understand. Source [1]
Motivation and Problem (2) • Numbers of systems using an ontology for modelling domain as a data structure is↑ increasing – output is resulting knowledge base. • Few ontology tools aimed at non-expert users wishing to create simple structures. • Without the need for “delving into intricacies of KR languages”. • Few features of KR are actually used (in NLP field) • Requirements – only include representing taxonomy of class members • Despite this - format must be adhered too to comply with standards Solution: Create Edit Ontologies using restricted form of English Source [1]
Overview of Controlled Languages (1) • Human NL – very complex • highly ambiguous • difficult to process automatically • difficult to extract information from. • extraneous info needed – context in Discourse, extralinguistic factors • Tone, Facial expressions. • FL – rigidly structured • machine processable • difficult or unatural for humans to use
Overview of Controlled Languages (2) "Controlled Natural Languages are subsets of natural languages whose grammars and dictionaries have been restricted in order to reduce or eliminate both ambiguity and complexity. Traditionally, controlled languages fall into two major categories: those that improve readability for human readers, particularly non-native speakers, and those that improve computational processing of the text." [5] • CL attempts to resolve dichotomy • CL – subset of NL • less complete. • Vocabulary and grammar are constrained to a specific task . • Challenge : Balance between expression and simplicity
Overview of Controlled Languages (3) Early origins - 1970’s . • CFE – restricted to 850 terms. • CTE – 70 K carefully chosen domain specific terms. Recently – • 1998 – KANT – CMU[6][7]. • 2000 – Atempto Controlled English -ACE • 2003 – ClearTalk[9]
CLIE – Controlled Language Information Extraction (1) • CLIE vs typical CLs • Traditional/ Current CLs are more complex grammatically. • more restricted vocabulary. • CLIE maximum expressivity within smallest set of syntactic constructs. • Advantage: Very little training - to master constructs.
CLIE – Controlled Language Information Extraction(2) Based on existing GATE NLP framework [7],[8]. Figure taken from [1]. Figure taken from [1].
CLIE – Controlled Language Information Extraction(3) - Example CL input Resulting ontology Figures taken from [1]. CL input taken from []1
CLIE – Controlled Language Information Extraction(4) – CL Grammar –Example extracted from [1].
CLIE – Controlled Language Information Extraction(4) – CL Grammar –Example extracted from [1].
Round Trip Ontology Authoring – In Brief • Language generator used to translate existing ontology Into text OR empty ontology used to generate empty templates with slots.
Conclusions • Few ontology tools aimed at non-expert users wishing to create simple structures. • Without the need for “delving into intricacies of KR languages”. • Few features of KR are actually used (in NLP field) • Requirements – only include representing taxonomy of class members • Despite this - format must be adhered too to comply with standards • Solution – Use Controlled Languages Source [1]
Future Work • Extensions to CLIE • Round Trip Ontology Authoring • Existing Collaboration with Nepomuk Project and Sheffield NLP Group
References [1] Valentin Tablan, Tamara Polajnar, Hamish Cunningham, Kalina Bontcheva. User-friendly ontology authoring using a controlled language. Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC), Genoa, Italy, May 2006. [2] N.F. Noy, M. Sintek, S. Decker, M. Crubzy, R.W. Fergerson, and M.A. Musen. 2001. Creating Semantic Web Contents with Prot´eg´e-2000. IEEE Intelligent Systems, 16(2):60–71. [3] M. Dean, G. Schreiber, S. Bechhofer, Frank van Harmelen, J. Hendler, I. Horrocks, D. L. McGuinness, P. F. Patel- Schneider, and L. A. Stein. 2004. OWL web ontology language reference. W3C recommendation, W3C, Feb. http://www.w3.org/TR/owl-ref/. [4] O. Lassila and R.R. Swick. 1999. Resource Description Framework (RDF) Model and Syntax Specification. Technical Report 19990222, W3C Consortium, http://www.w3.org/TR/REC-rdf-syntax/. [
References [5] http://www.ics.mq.edu.au/~rolfs/controlled-natural-languages/ 19th July 2006. 07.53 am GMT. [6] http://www.lti.cs.cmu.edu/Research/Kant/ 19th July 2006. 08.02 am GMT. [7] http://gate.ac.uk/ 19th July 2006. 08.02 am GMT. [8] H. Cunningham, D. Maynard, K. Bontcheva, and V. Tablan. 2002a. GATE: A Framework and Graphical Development Environment for Robust NLP Tools and Applications. In Proceedings of the 40th Anniversary Meeting of the Association for Computational Linguistics (ACL’02). [9] D. Skuce. 2003. A Controlled Language for Knowledge Formulation on the Semantic Web. http://www.site.uottawa.ca:4321/factguru2.pdf.