180 likes | 305 Views
Danica Damljanovi ć and Kalina Bontcheva. Enhanced Semantic Access to Software Artefacts. Outline. Motivation The GATE case study Semantic-based prototype Data collection Automatic content augmentation Storing implicit annotations Querying using text-based queries Example
E N D
Danica Damljanović and Kalina Bontcheva Enhanced Semantic Access to Software Artefacts
Outline Motivation The GATE case study Semantic-based prototype Data collection Automatic content augmentation Storing implicit annotations Querying using text-based queries Example Conclusion and Future work
Motivation Large software frameworks: hard to maintain: never enough documentation hard to find specific information significant learning curve for new developers working on software extensions software engineers who integrate relevant parts into their applications
Can semantic technologies help? Software documentation forum post source code Web site forum post Web site source code forum post paper Web site forum post Web site source code paper forum post source code
The GATE case study GATE (gate.ac.uk): open-source, General Architecture for Text Engineering development team over 15 people at present, over 30 over the years documentation about GATE software: dispersed on the Web: not easy to find by new/existing developers/users no unified interface: Google, gate.ac.uk, gmane mailing list search, etc.
The GATE case study: requirements Automatic generation of reference pages from the ontology: provide users with a single point of access to all knowledge, continuously kept up to date. generate automatically a web page: shown on its own or alongside the ontology tree, where searched concept is selected
Semantic-based prototype learn domain ontology Software documentation store Semantic repository annotate content text-based query
Data collection Downloaded around 10000 software artefacts about GATE: • source code, • source documentation, • GATE manual, • forum posts, • publications.
Export annotations Merge document metadata and annotations into the owl file using an information-extraction ontology: PROTON KM (http://proton.semanticweb.org/2005/04/protonkm)
Information-extraction ontology Document class resourceType property: refers to the type of the document, informationResourceIdentifier property: refers to the URL of the annotated document. Mention class: occursIn Document hasStartOffset and hasEndOffset: storing position of the annotation (new) refersAnything: to preserve the URI of the resource to which the mention is referring to
Access knowledge using text-based queries • QuestIO (Question-based interface to ontologies): • keyword-based queries • full-blown questions
QuestIO:Text-based query >> SeRQL “Java Class for parameters for processing resources in ANNIC?” select c0,"[inverseProperty]", p1, c2,"[inverseProperty]", p3, c4,"[inverseProperty]", p5, i6 from {c0} rdf:type {<http://gate.ac.uk/ns/gate-ontology#JavaClass>}, {c2} p1 {c0}, {c2} rdf:type {<http://gate.ac.uk/ns/gate-ontology#ResourceParameter>}, {c4} p3 {c2}, {c4} rdf:type {<http://gate.ac.uk/ns/gate-ontology#ProcessingResource>}, {i6} p5 {c4}, {i6} rdf:type {<http://gate.ac.uk/ns/gate-ontology#GATEPlugin>} where p1=http://gate.ac.uk/ns/gate-ontology#parameterHasType and p3=http://gate.ac.uk/ns/gate-ontology#hasRunTimeParameter and p5=http://gate.ac.uk/ns/gate-ontology#containsResource and i6=<http://gate.ac.uk/ns/gate-ontology#annic>
Demo http://gate.ac.uk/document-search
Future Work optimise query execution time: migrate from SeRQL >> SPARQL include simple ontology-driven data in the interface evaluation to follow: user-centric evaluation with GATE users
Thank you! Questions?