280 likes | 624 Views
The Semantic Web is Big. Thousands of OntologiesMillions of RDF documentsBillions of Statements. Reusing Ontologies. Semantic Web Search Engines like Watson, Sindice, Swoogle, Falcon-s, etc. help in finding and locating semantic information on the Web.However, they don't support the user in quickly understanding what the ontology is about, what it contains.
E N D
1. Identifying key concepts in an ontology through the integration of cognitive principles with statistical and topological measures Silvio Peroni, Enrico Motta and Mathieu d’Aquin
Knowledge Media Institute
The Open University
2. The context for the work on the SW testbeds in WP8 is given by this research programme on NGSW apps, which we started about three years ago, which is closely aligned with the OK project.
In particular,the idea of NGSW is to exploit large scale semantics by doing away with the classic assumptions characterizing semantic systems (closed conceptualizations, design time metadata alignment at design time, closed KA, etc..). These features of NGSW closely match the key tenets of the OK project: open systems, ability to acquire knowledge dynamically, ability to handle heterogeneity at run time
In open knowledge the core focus of our work in the initial 2 years, was in developing the two testbeds and on run-time mapping algorithmsThe context for the work on the SW testbeds in WP8 is given by this research programme on NGSW apps, which we started about three years ago, which is closely aligned with the OK project.
In particular,the idea of NGSW is to exploit large scale semantics by doing away with the classic assumptions characterizing semantic systems (closed conceptualizations, design time metadata alignment at design time, closed KA, etc..). These features of NGSW closely match the key tenets of the OK project: open systems, ability to acquire knowledge dynamically, ability to handle heterogeneity at run time
In open knowledge the core focus of our work in the initial 2 years, was in developing the two testbeds and on run-time mapping algorithms
3. Reusing Ontologies Semantic Web Search Engines like Watson, Sindice, Swoogle, Falcon-s, etc. help in finding and locating semantic information on the Web.
However, they don’t support the user in quickly understanding what the ontology is about, what it contains
4. Summarizing ontologies What is needed is a way to quickly get a general impression of what an ontology is about
When ask to summarize ontologies, people come up with explanations like
“The AKT portal ontology can be used to describe academic organizations. It covers concepts such as event, person, technology, project, etc..”
That is, they can extract the key concepts which can effectively summarize an ontology
5. Identifying Key Concepts
6. Research Issues What are the right concepts that could be used to describe an ontology concisely?
Are there any principles/regularities in the way human beings extract ‘key concepts’ from an ontology?
Can these principles be automated, to define algorithms that are able to characterize an ontology the way people do?
How effectively do the resulting ontology signatures allow knowledge consumers to locate the information they need?
7. Identifying key concepts: Approach Integration of cognitive criteria with lexical statistics, formal and topological criteria
Criteria
Natural categories (Rosch, 1978)
information rich concepts that are ‘basic’ from a cognitive standpoint
E.g., dog, cat, chair, etc..
Density
information rich concepts from a formal standpoint
i.e., concepts rich in attributes, instances, or subclasses,
We use both local and global density measures
Popularity
Lexical statistics
Familiar words tend to be more descriptive than unfamiliar one
We use both global and local popularity measures
Best ontology coverage (topological)
We want to ensure that for each concept C in the ontology, there is a key concept Ki, such that either C ? Ki or Ki ? C
8. Computing Natural Categories According to Rosch, people characterize the world primarily in terms of basic objects, such as chair or car.
These basic objects are not the most general ones (e.g. vehicle, furniture) and not the specific ones (e.g. red car, nice chair).
Hence, we consider as basic objects those that
Are central in the hierarchy
Have a simple label
Because of linguistic evolution, normally a natural category has a simple name.
For example, “Chair” is ‘more natural’ than “KitchenChair”
9. Basic Level: example
10. Computing “Name Simplicity” NS(C) = 1 - c(nc-1),
nc = number of compounds in the label
c a constant
in our experiments, we use c = 0.3.
NS (“Artist”) = 1
NS (“MusicalArtist”) = 0.7
11. Density The density of a concept C is a measure of how richly described the concept is in the ontology
It is is computed on the basis of its number of direct sub-concepts, properties and instances
We consider 2 different types of density: the global density and the local density
12. Global Density
13. Global Density: example
14. Local Density The local density of a concept C refers to a density value which is relative to those of the surrounding concepts
15. Local Density: example
16. Popularity How much a category is popular is another criteria that can be used to identify whether a particular category C is a key concept
The popularity of a concept, C, is measured as the number of results returned by querying Yahoo with the name of C as keyword
Compound names are transformed to a sequence of lower case keywords separated by a space (Marine-Animal, MarineAnimal, marineAnimal, marine_animal are all transformed in “marine animal”)
17. Local and global popularity As in the case of density, we also want to take into consideration both the global and local popularity of a concept
We compute these analogously to the way we derive global and local densities
18. Coverage The coverage criterion states that the set of key concepts identified by our algorithm should maximise the coverage of the ontology with respect to its is-a hierarchy
Not only we want the right type of concepts to be returned by our method, but also the right spread of concepts must be achieved, to provide the best possible illustration of the ontology
19. Total Coverage
20. Partial Coverage
21. Coverage: formulas
22. The algorithm (1/2) For each class C in O we compute its global and local density, global and local popularity and the natural category value.
For each class C in O we compute score(C)
Given a number k = n (in our experiments k = 15), let S be the set of k classes in O with the best score and let T be the set of n-k classes in {O ? S} with the best score. If T is empty, we return S and we stop
Otherwise, let c be the average of all the values obtained by invoking the function contribution(Ci, {S ? T}), for each Ci ? {S ? T}. And let a be the average of all the values obtained by invoking the function overallScore(Ci, {S ? T}), again for each Ci ? {S ? T}
23. The algorithm (2/2) Let W be the class in T with the worst overallScore(W, {S ? T}) of all the classes in {S ? T}, and let R be the set {{S ? T} ? {W}}. If there is a class B ? {O ? {S ? T}}, such that
the average a’ of all the values obtained by invoking overallScore(C, {R ? {B}}), computed for each C ? {R ? {B}}, is greater than a,
the average c’ of all the values obtained by invoking contribution(C, {R ? {B}}), computed for each C ? {R ? {B}}, is greater than or equal to c,
we swap W with B in {S ? T} and we go back to step 4. Otherwise we return {S ? T} and we stop.
24. Evaluation 4 ontologies were used for our tests
We asked 8 semantic web experts to select up to 20 concepts they considered to be the best descriptors of the ontologies. We also asked them to try and maximise ontology coverage
For each ontology, a number of concepts emerged, which a high percentage of experts considered to be key concepts. On these concepts, the experts showed on average a 74.68% agreement ratio
25. Algorithm results (v3) We implemented three versions of the algorithm
V1 exhibited a very bad performance (average agreement = 42.56%; no popularity)
V2 was much better (average agreement = 63.61%; no nat. categories)
V3 showed an excellent correlation with the experts (average agreement = 72.08%)
27. Conclusions We defined an algorithm for computing a summary of an ontology, in the form of key concepts
This algorithm is almost as good as humans achieving this task
The implementation of the technique will be provided as a service on top of Watson, and used to provide meaningfull snapshots of ontologies
Other applications include:
To support new navigation/visualization mechanisms, which can improve over the taxonomic displays provided by current ontology engineering tools
To identify priority concepts in ontology mapping, automatic classification, ontology evolution, etc.
To provide mechanisms for knowledge providers to advertise knowledge contents, without publishing the whole ontology