290 likes | 411 Views
Discovering Concept Coverings in Ontologies of Linked Data Sources . Authors: Rahul Parundekar , Craig A. Knoblock and Jose-Luis Ambite (USC ISI) Presented By: Harsh Singh (USC). Outline:. Introduction Motivation Problem Solution Proposed Key Concepts Algorithm Results
E N D
Discovering Concept Coverings in Ontologies of Linked Data Sources Authors: RahulParundekar, Craig A. Knoblockand Jose-Luis Ambite (USC ISI) Presented By: Harsh Singh (USC)
Outline: • Introduction • Motivation • Problem • Solution Proposed • Key Concepts • Algorithm • Results • Conclusion/ Future Work
Introduction: What is Ontology? - An ontology is a specification of a conceptualization. What is linked data? - Describes best practice for exposing, sharing, and connecting pieces of data, information, and knowledge on the Semantic Web using URIs and RDF. Why and how these are linked together? - Semantic web done right (Tim Berners-Lee). - Achieves interoperability goal of the Semantic Web. - Uses Owl:sameAs Owl:sameAs - Links an individual to an individual. <owl:Classrdf:ID="FootballTeam"> <owl:sameAsrdf:resource="http:// sports.org/US#SoccerTeam"/> </owl:Class>
Motivation: • Web of Linked Data. • Finding Alignments • among these sources. • Not one-to-one • concept equivalences. • Find covering • of concepts. Problem Example: GeoNames has only one class - geonames:Feature, while DBpedia, has rich set of attributes.
Equivalent instances in the different domains connected with owl:sameAs. Populated Place City ??? No Links Los Angeles Owl:sameAs City of Los Angeles <http://dbpedia.org/resource/Los_Angeles> <http://www.w3.org/2002/07/owl#sameAs> <http://sws.geonames.org/5368361/>
Problem???Ontologies are not connected • Only a small number of Ontologies are linked • 15 out of the 190 sources. •Existing Concepts may not be sufficient for exhaustive set of alignments • Linked Data sources reflect RDBMS schemas of sources from which they are derived. • DBpedia has rich ontology • GeoNames has only one concept (“geonames:Feature”)
Solution Proposed: Two Step Approach: • Find Alignments between Linked Data already present. • Use equality (owl:sameAs) links between instances in Linked Data. • Find alignments between Concepts, using set containment theory. • Generate new concepts to find alignments not previously possible with existing concepts. • Using Restriction class. • Concept Covering.
Sources Used for Alignment: • Linking GeoNames with Places in DBpedia. • DBpedia covers multiple domains including around 526,000 places geographical features. • GeoNames, geographic source with about 7.8 million geographical features. • Linking LinkedGeoData with Places in DBpedia. • Linking Species from Geospecies with DBpedia • Linking Genes from GeneID with MGI
Key Concepts: • Restriction Classes • Represented as {p = v} using OWL-DL . • Class assertion (rdf:type) & valuerestrictions on both data and object properties. • p is an object property and v is a resource. • p is a data property and v is a literal. Example: {rdf:type=PopulatedPlace}, {featureClass=P} • Equivalent Classes • If their respective instance sets can be identified as equal after following the owl:sameAs links. Example: {Geonames:countryCode=ES} is equivalent to {dbpedia:country = dbpedia:Spain} • Concept Covering • Alarger concept with a union of smaller ones using set containment.
Restriction Class Creation: Set of all instances in GeoNames Set of all instances with featureClass=P Restriction Classes Set of all instances with rdf:type=PopulatedPlace Set of all instances inDBpedia
Aligning Restriction Classes using Extensional Approach: featureClass=P rdf:type=PopulatedPlace r1 r2 Img(r1) r2 Set of instances from DBpedia that r1is linked to
If two classes are exactly equal |ClassA ∩ ClassB| = |ClassA ∩ ClassB| = 1 |ClassA| |ClassB| Otherwise: r2 r1 |Img(r1) ∩ r2| > 0.9 |Img(r1) ∩ r2| > 0.9 |Img(r1) | |r2|
Algorithm • Step 1: Finding Alignments with Atomic Restriction Classes. for all p1in Source1and distinctv1 associated with p1dor1← restriction class {p1 = v1} containing all instances where p1 = v 1 Img(r1) ← Find all corresponding instances from Source2 to those in r1, linked by owl:sameAsfor all p2 in Source2 and distinct v2 associated with p2do r2← restriction class {p2 = v2} containing all instances where p2 = v2 P ← |Img(r1)∩r2| , R ← |Img(r1)∩r2| |r2| |r1 |ifP ≥ θ then alignment(r1,r2) ← r1 ⊂ r2end ififR ≥ θ then alignment(r1,r2) ← r2 ⊂ r1end ififP ≥ θ and R ≥ θ then alignment(r1 ,r2) ← r1 ≡ r2 end if end for end for Value of θ= 0.9
Approach • Start with a superset of all instances. • Generate smaller subset of all properties. • Generate yet smaller subset of each value. Geonames DBpedia ….. dbpedia:region featureCode rdf:type featureClass ….. …. = Populated Place = P
Alignment : • Alignment between restriction classes {geonames:countryCode=ES} from GeoNames and {dbpedia:country = dbpedia:Spain} from Dbpedia. • |Img(r1)| = 3198, |r2| = 4143 • |Img(r1) ∩ r2 | = 3917 • R′ = 0.9997 and P ′ = 0.9454 • R’, P’ > 0.9, thus equivalent. • If equivalence can’t be formed then we can use subset relation with smaller concept in larger concept.
Step 2: Identifying Concept Coverings. (Disjunction Operator for Restriction Class) for all alignments found in the previous step, with larger concepts from one source with multiple subclasses from the other source do UL← larger restriction class {pL= vL}, and corresponding instances. for all smaller concepts grouped by a common property (pS) do US← the union restriction class, and the corresponding instances of all the smaller restriction classes {pS= {v1,v2,...}} UA ← Img(UL)∩US , PU←|UA| , RU←|UA| |US| |UL| ifRU ≥ θthen alignment(r1,r2)←UL≡US end if end for end for
Finding Concept covering: featureCode=S.SCH rdf:type=EducationalInstitution featureCode=S.SCHC featureCode=S.UNIV
Finding Concept covering Continued…: featureCode=S.SCH rdf:type=EducationalInstitution featureCode=S.SCHC = featureCode=S.UNIV Us Img(UL) featureCode=S.SCH U featureCode=S.SCHC UfeatureCode=S.UNIV
Approach • For all alignments found in the Step 1, • Group all subset alignments according to common larger restriction class. 2. Form a union concept such that all classes have the same property. 3. Try to match the union concept to the larger class. Union of Smaller Restriction Classes (US) Larger Restriction Class (UL) IntersectionSet of Linked Instances (UA) = US∩UL
Finding the equivalence in the union of Subset class and Larger Class featureCode = {S.SCH, S.SCHC, S.UNIV} rdf:type = EducationalInstitution = UA=US∩UL US UL |UA| |US| |UA| |UL| > 0.9 = 396 / 404 = 0.98 > 0.9
What about the missing 8 Educational Institutes: • 1 with featureCode = S.HSP(Hospitals) There are 31 instances with S.HSP because of which Hospitals are not subsets. • 3 with featureCode = S.BLDG(Buildings). • 1 with featureCode = S.EST(Establishment). • 1 with featureCode = S.LIBR(Library). • 1 with featureCode=S.MUS(Museum). • 1 doesn’t have a featureCode.
Finding Outliers: • Instances are not part of the alignment because of following : • Their restriction class is not a subset (P’<0.9). • Some of these instances are 1. Linked Incorrectly with owl:sameAs. 2. Assigned wrong value during RDF generation. Example of Outlier: • In Aligning dbpedia:country = dbpedia:Spainwith geonames:countryCode=ES. • 3917 out of 3918 instances in GeoNames agreed with this. • ONE instance had its country code as Italy so flagged as outlier.
Results • Total of 7069 Concept Coverings that cover 77966 subset relations were found for a compression ratio of 11:1
Evaluation of Results: GeoNames-Dbpedia • Manual Evaluation of 236 out of 752 alignments done. • 152 identified as correct, Precision of 64.4% • Common problems evaluated as incorrect (84) • ‘County’ property was mis-labelled as ‘Country’ (5) • Using the ‘.svg’ file name of the flag of a Country as value of ‘dbpedia:country’ property (35) • Not enough support for set containment detection (P’ < 0.9) (14) • Incompletely detected alignments (7) • Other problems : Misaligned with parent (14) etc. (9)
Recall and F-Measure • 63 Country-Country Code Alignments evaluated manually • Precision:53/63=84.13% • 26 were correct • Same Place different names : United Kingdom in GeoNames vs. England, Scotland, Wales, Northern Ireland in DBpedia. • 27 were assumed correct because data had inconsistences • A ‘.svg’ file appeared as country in DBpedia. • Recall: 53/169 = 31.36% • F-Measure: 45.69%
Results: LinkedGeoData - DBpedia • Evaluation• Manually Evaluated 200 out of 5843 alignments • • 157 identified as correct, • Precision of 78.2% • • Common problems evaluated as incorrect (43) • • Multiple spellings for the same item (14) • • Partially or incompletely found (20) • • Other problems (9)
Conclusion and Future Work: Conclusion: • Able to find Concept Coverings in the Geospatial, Biological Classification & Genetics Domain. • Find alignments where no direct equivalence was evident. • Ability to find Outliers. • Help identify inconsistencies in the data. Future Work: • Can be used to detect Patterns within properties like geonames:countryCodeand dbpedia:country. • Flag outliers and informtheir sources for correction.
Questions? Thank You