70 likes | 178 Views
GATE and the Semantic Web Hamish Cunningham , Kalina Bontcheva, Wim Peters, Marin Dimitrov 1 , Atanas Kiryakov 1 , Department of Computer Science, University of Sheffield 1 OntoText Lab, Sirma AI Ltd. Brief intro to GATE (a General Architecture for Text Engineering),
E N D
GATE and the Semantic Web • Hamish Cunningham, Kalina Bontcheva, Wim Peters,Marin Dimitrov1, Atanas Kiryakov1, Department of Computer Science, University of Sheffield1OntoText Lab, Sirma AI Ltd. • Brief intro to GATE (a General Architecture for Text Engineering), • Hand waving about LT and the Semantic Web, • Demo 1(7)
A Ubiquitous Permeable Web • The next generation of the web must be: • ubiquitous: semantics for every device, every organisation, every individual; • permeable: allow contextual data to penetrate and persist; • companionable: able to engage with us via multiple natural modalities. • Roles for Language Technology: • discovery of semantics (ubiquity); • mediating between context and personal semantic memories (permeability); • conversing with people and the semantic web (companionableness). 2(7)
Critical Mass for the Semantic Web • The SW: machine processable, repurposable data to compliment hypertext • But: semantics = 0.0000000...% of the Web • How to achieve critical mass? Huge scale automatic annotation. Requirements: • Huge scale:– freely available to all EU citizens– distributed (over a Grid)– re-purposeable (delivered as Web Services) • Portability and robustness via:– simple and therefore shallow HLT methods– +ve and –ve learning– analogs of IPSEs for computer-literate users 3(7)
GATE is: • An architectureA macro-level organisational picture for LE software systems. • A frameworkFor programmers, GATE is an object-oriented class library that implements the architecture. • A development environmentFor language engineers, computational linguists et al, GATE is a graphical development environment bundled with a set of tools for doing e.g. Information Extraction. • Some free components... ...and wrappers for other people's components • Tools for: evaluation; visualisation/edit; persistence; IR; IE; dialogue; ontologies; etc. • Free software (LGPL). Download at http://gate.ac.uk/download/ 4(5)
Architectural principles • Non-prescriptive, theory neutral (strength and weakness) • Re-use, interoperation, not reimplementation (e.g. v1 used LT-NSL for SGML input; v2 talks to other XML-based systems, APIs and standards) • (Almost) everything is a component, and component sets are user-extendable • Component-based development • An OO way of chunking software: Java Beans • GATE components: CREOLE = modified Java Beans (Collection of REusable Objects for Language Engineering) • The minimal component = 10 lines of Java, 10 lines of XML, 1 URL. 5(7)
Displaying Multilingual Data • All the visualisation and editing tools for ML LRs use enhanced Java facilities: 6(7)
GATE demo • Components and the main UI; the resources tree • Document formats, databases • IE, IR, annotation, evaluation, WordNet • Ontologies, OntoGazetteer, Protégé, DAML export 7(7)