100 likes | 233 Views
Uncertainty reasoning for Linked Data. Dave Reynolds. Linked data - a strikingly successful model for exploiting semantic web technology exhibits uncertainty related issues : ambiguity, misalignment, reliability what approach could we take address this?
E N D
Uncertainty reasoning for Linked Data Dave Reynolds
Linked data - a strikingly successful model for exploiting semantic web technology exhibits uncertainty related issues: ambiguity, misalignment, reliability what approach could we take address this? without losing the simplicity which has enabled significant adoption Uncertainty reasoning for linked data
Linked data • Use URIs as names for things • Use HTTP URIs so that people can look up those names. • When someone looks up a URI, provide useful information, using the standards (RDF, SPARQL) • Include links to other URIs. so that they can discover more things
Uncertainty in linked data1. Misalignment of instance matches • link datasets by resolving co-references and publishing links • links published as owl:sameAs (all or nothing) • match errors: • match uncertainties not accessible • erroneous assumptions (e.g. clinical trial example) • can partly address by use of skos mapping vocabulary
Uncertainty in linked data2. Ambiguity from merging datasets • datasets have different assumptions, definitions, context (esp. time) for different measures • leads to multiple different values E.g. <http://dbpedia.org/resource/London> dbo:populationMetro 12300000; dbp:populationMetro “12,300,000 to 13,945,000”; dbo:populationTotal 7556900;owl:sameAs <http://www.okkam.org/ens/id...>. <http://www.okkam.org/ens/id...> :population 7421209.
Uncertainty in linked data3. Other issues • Misalignment of models • e.g. freebase/dbpedia links generated (temporary) problems :Musician owl:equivalentClass :Person • Source reliability • not unique to linked data but amplifies it
Mitigation approaches?1. Weighted link vocabulary • Develop a simple, common vocabulary for expressing uncertain co-reference links • Clients or intermediates can choose how to match the link evidence to equivalence assertions void:LinkSet [a ur:WeightedLink; ur:target <…>; ur:match <…>; ur:weight 0.7] … a ur:UncertainLinkSet ur:matchAlorithm alg:JaroStringMatch .
Mitigation approaches?2. Imprecise value vocabulary • Develop a simple, common vocabulary for expressing imprecise values that can arise from known measurement uncertainty or merge ambiguity :London :population [a ur:ImpreciseValue :sampleValue [:value 7556900; :source :dbpedia; :context :year2009]; :sampleValue [:value 7421209; :source :okkam; :context :year2008]; :estimatedValue 7500000] .
Mitigation approaches?3. Override graphs • Allow clients to chose which parts of merged data sources they adopt (“trust”) and publish that decision • Allow clients to publish deltas to public datasets correcting merge or other artefacts – per-link and per-assertion granularity void:DataSet ur:argGraph void:DataSet ur:ComputedDataSet ur:Combinator ur:Difference Union
Conclusion • multiple issues in ambiguity and uncertainty in linked data • proposed problems and solutions illustrative rather than definitive • low hanging fruit • area ripe for contribution