220 likes | 303 Views
Learning Meta-Descriptions of the FOAF Network. Gunnar AAstrand Grimnes, Pete Edwards & Alun Preece University of Aberdeen UK. Outline. Motivation Semantic Web Data Characteristics FOAF Clustering Inductive Logic Programming Results Future Work Conclusions.
E N D
Learning Meta-Descriptions of the FOAF Network Gunnar AAstrand Grimnes, Pete Edwards & Alun Preece University of Aberdeen UK
Outline • Motivation • Semantic Web Data Characteristics • FOAF • Clustering • Inductive Logic Programming • Results • Future Work • Conclusions ISWC04 - Hiroshima, Japan
Why the Semantic Web Needs Learning • SW consists of: • inter-related classes • resource members of one or more classes • … but ontologies are never perfect, some concept you need is missing: • granularity is wrong, i.e. lecturer and professor are both Acedemic • This time/location/context needs special classifications, i.e. restaurants I like ISWC04 - Hiroshima, Japan
Semantic Web Topologies Semantic Forests • Disconnected, shallow trees • For example: • RSS • Dublin Core • No need for RDF, XML is fine! ISWC04 - Hiroshima, Japan
Semantic Web Topologies II Semantic Webs • Interconnected graphs • No clear line between resources • Only example is Friend-Of-A-Friend ISWC04 - Hiroshima, Japan
Friend-Of-A-Friend • “Describing people, links between them and what they create and do” • Huge user-base: • 259298 known FOAF URLs • 6.5 Million triples (as of September 2004[1]) • Well known outside Semantic Web circles, f.x. Live journal, E-Academy [1] - J. Paolillo & E. Wright, Characterising FOAF, 1st FOAF Workshop, Galway, Ireland, 2004. ISWC04 - Hiroshima, Japan
FOAF Example I <foaf:Person> <foaf:mbox rdf:resource=“mailto:ggrimnes@csd.abdn.ac.uk” /> <foaf:name>Gunnar AAstrand Grimnes</foaf:name> <foaf:projectHomepage rdf:resource=“...research/AgentCities”/> <foaf:groupHomepage rdf:resource=“...research/agentsgroup” /> <foaf:depiction rdf:resource=“.../~ggrimnes/gfx/me.jpg” /> <foaf:interest rdf:resource=“http://www.w3.org/2001/sw/” /> <foaf:interest rdf:resource=“http://www.agentcities.net” /> <foaf:made rdf:resource=“.../AgentCities/GraniteNights” /> <contact:nearestAirport> <airport:Airport rdf:about=“http://daml.org/airport?ABZ” /> </contact:nearestAirport> ISWC04 - Hiroshima, Japan
FOAF Example II <foaf:knows> <foaf:Person> <foaf:mbox rdf:resource=“mailto:maym@foobar.lu” /> <rdfs:seeAlso rdf:resource=“http://martinmay.net/foaf.rdf”/> </foaf:Person> </foaf:knows> <foaf:knows> <foaf:Person foaf:name=“Sonja A Schramm”> <foaf:mbox_sha1sum>83276f9127...</foaf:mbox_sha1sum> </foaf:Person> </foaf:knows> </foaf:Person> <rdf:Description rdf:about=“”> <wot:assurance rdf:resource=“foaf.rdf.asc” /> </rdf:Description> ISWC04 - Hiroshima, Japan
Cleaning FOAF • FOAF is “scruffy”: • Often created in text editor and/or copy & paste => human errors • rdf:resource vs. Literal • rdf:seeAlso vs. rdfs:seeAlso • Interest is not standard: • http://www.semanticweb.org vs. http://w3.org/sw vs. http://sciam.com/sw_story etc…. ISWC04 - Hiroshima, Japan
Learning from FOAF Crawled RDF Pre-processing SWRL Rules Clustering Prolog rules Clustering Tree ILP Prolog Facts ISWC04 - Hiroshima, Japan
Comp.Sci.Dept. Research Group: A3 Research Group: AKT … … ? Pete Gunnar Alun Derek Dave Clustering • Hierarchical Agglomerative Clusterer (HAC) • Builds tree by recursively merging most similar clusters ISWC04 - Hiroshima, Japan
RDF Similarity Measure I • How to compare two RDF resources? • Initial experiments with Hamming Distance... • ...but need to consider graph around resource • Distance metric for comparison of conceptual graphs (Montes-y-Gómez et al., 2000) • Given two (sub)graphs, considers node and edge overlap • Must extract surrounding sub-graph ISWC04 - Hiroshima, Japan
RDF Similarity Measure II • Sc = Node overlap for graph G1 & G2: Gc G2 G1 A A x y x y B C B C B x z x x z z D F D F/A D A z z E E ISWC04 - Hiroshima, Japan
Extract Sub-Graph • Traverse the RDF graph, 1 step backwards, and 2 steps forward: Gunnar Grimnes has-phd-student University of Aberdeen has-dept Comp.Sci. Dept. author-of superSubjectOf Semantic Web RDF subject ISWC04 - Hiroshima, Japan
Inductive Logic Programming • We used Aleph • Learning a cluster description: • Cluster members as positive examples • May use the 100 most frequent predicates, • But not foaf:knows! • Hypothesis explored is constrained by RDF types, i.e. foaf:knows only applied to foaf:Person resources ISWC04 - Hiroshima, Japan
Results I • 8 people authoring a paper • Nice if people from different institutions, or if generalised to several papers • Embryonic Community of Practise member(A) :- dc___creator(B,A), dc___title(B,”Managing Reference: Ensuring Referential Integrity of Ontologies for the Semantic Web”). ISWC04 - Hiroshima, Japan
Results II • Cluster is Aberdeen Advanced Knowledge Technology (AKT) Group: • but AKT groups at Southampton, Open University etc. have same group-homepage • Open World Assumption will always be problematic member(A) :- foaf___groupHomepage(A,”http://www.aktors.org”). ISWC04 - Hiroshima, Japan
Results III • Cluster is Aberdeen Agents Group, but: • Local AKT group also close to ABZ • Better might have been: member(A) :- contact___nearestAirport(A,”http://www.daml.org/airport?ABZ”). member(A) :- foaf__interest(A,”agents”), foaf__worksFor(A, “http://www.csd.abdn.ac.uk”). ISWC04 - Hiroshima, Japan
Results IV • Goldbeck et al, Trust Network on the Semantic Web, CIA-2004 • Looks good! • …but not yet: • only 8 people member(A) :- trust___trustsHighly(B,A). ISWC04 - Hiroshima, Japan
Results V • No way to determine interest-value of a predicate • Use background knowledge to filter predicates? • Would format appl./rdf+xml be better? • Filter predicate/value tuples? • member(A) :- • dc___creator(B,A), dc___format(B,”application/postscript”). ISWC04 - Hiroshima, Japan
Future Work • FOAF is too sparse => weird results • We have extracted dataset from IMDb • denser, but not as “real” • Movie-trivia Learner • Background Knowledge for driving predicate selection • Common-sense knowledge • Ontologies ISWC04 - Hiroshima, Japan
Conclusions • The Semantic Web needs learning • Pre-processing is time-consuming & hard • We can learn new conceptualisation & descriptions of these… but: • Evaluation is tricky • Scaling-up is the single biggest problem for the Semantic Web! ISWC04 - Hiroshima, Japan