1 / 35

Computing FOAF Co-reference Relations with Rules and Machine Learning

Computing FOAF Co-reference Relations with Rules and Machine Learning. Jennifer Sleeman and Tim Finin University of Maryland, Baltimore County The Third International Workshop on Social Data on the Web, November 2010. http://ebiquity.umbc.edu/paper/html/id/506/. FOAF.

vanya
Download Presentation

Computing FOAF Co-reference Relations with Rules and Machine Learning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Computing FOAF Co-reference Relations with Rules and Machine Learning Jennifer Sleeman and Tim FininUniversity of Maryland, Baltimore County The Third International Workshop on Social Data on the Web, November 2010 http://ebiquity.umbc.edu/paper/html/id/506/

  2. FOAF • Friend of a Friend (FOAF) vocabulary describes people and their relationships • One of oldest and most widely used ontologies • Does not include a globally unique identifier • Inverse functional properties (IFPs) help • Multiple foaf instances referring to the same person are common • Increasingly so with more linked data introductionfoaf co-referenceapproachmethodologyevaluationconclusions

  3. Linking data • Data integration requires linking instances from different data sets • Linking foaf instances is a common and typical use case • Sindice reports 23 foaf instances all referring to Sir Tim Berners Lee • Probably more than my query revealed • Only a handful are linked via owl:sameAs • Automatically linking foaf instances is not always easy introductionfoaf co-referenceapproachmethodologyevaluationconclusions

  4. Example 1 Common properties but can wesay this is the same person… <swivt:Subject rdf:about="http://tw.rpi.edu/wiki/Special:URIResolver/Bijan_Parsia"> <rdfs:label>Bijan Parsia</rdfs:label> <swivt:page rdf:resource="http://tw.rpi.edu/wiki/Bijan_Parsia"/> <rdfs:isDefinedBy rdf:resource="http://tw.rpi.edu/wiki/Special:ExportRDF/Bijan_Parsia"/> <rdf:type rdf:resource="http://tw.rpi.edu/wiki/Special:URIResolver/Category-3APerson"/> <property:Foaf-3Adepiction rdf:resource="http://tw.rpi.edu/wiki/Special:URIResolver/Anonymous.png"/> <foaf:firstName rdf:datatype="http://www.w3.org/2001/XMLSchema#string">Bijan</foaf:firstName> <foaf:interest rdf:resource="http://tw.rpi.edu/wiki/Special:URIResolver/Category-3ASemantic_Web_Topic"/> <foaf:name rdf:datatype="http://www.w3.org/2001/XMLSchema#string">Bijan Parsia</foaf:name> <foaf:surname rdf:datatype="http://www.w3.org/2001/XMLSchema#string">Parsia</foaf:surname> <property:Has_affiliation rdf:resource="http://tw.rpi.edu/wiki/Special:URIResolver/Manchester_University"/> <property:Has_identifierrdf:resource="http://tw.rpi.edu/wiki/Special:URIResolver/Bijan_Parsia"/> </swivt:Subject> http://tw.rpi.edu/wiki/Special:ExportRDF/Bijan_Parsia <foaf:Person rdf:ID="bparsia"> <foaf:mbox_sha1sum>f49a6854842c5fa76dc0edb8e82f8fe04fd56bc9</foaf:mbox_sha1sum> <foaf:firstName>Bijan</foaf:firstName><foaf:surname>Parsia</foaf:surname><foaf:name>Bijan Parsia</foaf:name> <foaf:homepage rdf:resource="http://trust.mindswap.org/cgi-bin/FilmTrust/foaf.cgi?user=bparsia"/> <foaf:img rdf:resource="http://www.mindswap.org/~bparsia/talks/uri-use/bijan.jpg"/> <foaf:depiction rdf:resource="http://www.mindswap.org/~bparsia/talks/uri-use/bijan.jpg"/> <foaf:nick>bparsia</foaf:nick> <foaf:holdsAccount> <foaf:OnlineAccount> <foaf:accountName>bparsia</foaf:accountName> <foaf:accountServiceHomepage rdf:resource="http://trust.mindswap.org/FilmTrust/"/> </foaf:OnlineAccount> </foaf:holdsAccount> http://trust.mindswap.org/cgi-bin/FilmTrust/foaf.cgi?user=bparsia#tt0084827-bparpia

  5. Example 2 Aliases and slight namevariations… <foaf:Person> <foaf:name>James A. Hendler</foaf:name> <foaf:firstName>James</foaf:firstName> <foaf:surname>Hendler</foaf:surname> <foaf:publications>http://ebiquity.umbc.edu/papers/select/person/James/Hendler/</foaf:publications> <foaf:homepage rdf:resource="http://www.cs.umd.edu/~hendler/"/> <foaf:workInfoHomepage rdf:resource="http://www.cs.umd.edu/~hendler/"/> http://ebiquity.umbc.edu/person/foaf/James/A./Hendler/foaf.rdf <foaf:Person rdf:ID="jhendler"> <foaf:mbox_sha1sum>0b62d4242736e64be6138547c79a811b3e82fd52</foaf:mbox_sha1sum> <foaf:firstName>Jim</foaf:firstName><foaf:surname>Hendler</foaf:surname><foaf:name>Jim Hendler</foaf:name> <foaf:title>Tetherless World Constellation Chair</foaf:title> <foaf:homepage rdf:resource="http://trust.mindswap.org/cgi-bin/FilmTrust/foaf.cgi?user=jhendler"/> <foaf:homepage rdf:resource="http://www.cs.umd.edu/~hendler"/> <foaf:depiction rdf:resource="http://www.semanticgrid.org/q-iantbljim.jpg"/> <foaf:workplaceHomepage rdf:resource="http://owl.mindswap.org"/> <foaf:img rdf:resource="http://www.cs.umd.edu/~hendler/hendler.gif"/> <foaf:depiction rdf:resource="http://www.cs.umd.edu/~hendler/hendler.gif"/> <foaf:nick>jhendler</foaf:nick> <foaf:openID rdf:resource="http://jhendler.pip.verisignlabs.com/" /> <foaf:holdsAccount> <foaf:OnlineAccount> <foaf:accountName>jhendler</foaf:accountName> <foaf:accountServiceHomepage rdf:resource="http://trust.mindswap.org/FilmTrust/"/> </foaf:OnlineAccount> </foaf:holdsAccount> http://www.cs.rpi.edu/~hendler/foaf.rdf

  6. Example 3 What if mbox_sha1sums aredifferent? <Agent rdf:about="http://identi.ca/user/53505"> <mbox_sha1sum>08445a31a78661b5c746feff39a9db6e4e2cc5cf</mbox_sha1sum> <name>David Wood</name> <homepage rdf:resource="http://dw2-0.com"/> <weblog rdf:resource="http://identi.ca/dw2"/> <holdsAccount><OnlineAccount rdf:about="http://identi.ca/user/53505#acct"> <accountServiceHomepage rdf:resource="http://identi.ca/"/> <accountName>dw2</accountName> <accountProfilePage rdf:resource="http://identi.ca/dw2"/> <sioc:account_of rdf:resource="http://identi.ca/user/53505"/> <sioc:follows rdf:resource="http://identi.ca/user/136#acct"/> </OnlineAccount></holdsAccount> http://identi.ca/dw2/foaf <foaf:Person rdf:about="http://zepheira.com/team/dave/#me"> <foaf:name>David Wood</foaf:name> <foaf:title>Dr.</foaf:title> <foaf:givenname>David</foaf:givenname> <foaf:family_name>Wood</foaf:family_name> <foaf:nick>prototypo</foaf:nick> <foaf:mbox_sha1sum>37c8d030d4e615d05f31625b3460532a3f4e214e</foaf:mbox_sha1sum><foaf:homepage rdf:resource="http://prototypo.blogspot.com/"/> <foaf:depiction rdf:resource="http://www.itee.uq.edu.au/~dwood/images/dave_w_0.jpg"/> <foaf:phone rdf:resource="tel:+1-(571)-331-3723"/> <foaf:workplaceHomepage rdf:resource="http://www.zepheira.com/"/> <foaf:workInfoHomepage rdf:resource="http://www.zepheira.com/team/dave"/> <foaf:schoolHomepage rdf:resource="http://www.vmi.edu/"/> <foaf:schoolHomepage rdf:resource="http://www.nps.navy.mil/"/> <foaf:schoolHomepage rdf:resource="http://www.itee.uq.edu.au/"/> <foaf:aimChatID>piprototypo</foaf:aimChatID> http://www.itee.uq.edu.au/~dwood/dave.rdf#me

  7. Example 3 cont. Which David Wood was amindswapper? <ms:Researcher rdf:ID="David_Wood" rdfs:label="David Wood"> <foaf:name>David Wood</foaf:name> <foaf:mbox> <owl:Thing rdf:about="mailto:dwood@mindswap.org"/> </foaf:mbox> <foaf:homepage> <foaf:Document rdf:about="http://www.mindswap.org/~dwood/"/> </foaf:homepage> <foaf:workInfoHomepage> <foaf:Document rdf:about="http://www.mindswap.org/~dwood/"/> </foaf:workInfoHomepage> </ms:Researcher> http://www.mindswap.org/2004/owl/mindswappers#David.Wood

  8. Example 5 Could jgolbeck and Jennifer Golbeck be the same person … <foaf:Person rdf:ID="jgolbeck"> <foaf:mbox_sha1sum>08445a31a78661b5c746feff39a9db6e4e2cc5cf</foaf:mbox_sha1sum> <foaf:firstName></foaf:firstName><foaf:surname></foaf:surname><foaf:name> </foaf:name> <foaf:homepage rdf:resource="http://trust.mindswap.org/cgi-bin/FilmTrust/foaf.cgi?user=jgolbeck"/> <foaf:img rdf:resource=""/> <foaf:depiction rdf:resource=""/> <foaf:nick>jgolbeck</foaf:nick> <foaf:holdsAccount> <foaf:OnlineAccount> <foaf:accountName>jgolbeck</foaf:accountName> <foaf:accountServiceHomepage rdf:resource="http://trust.mindswap.org/FilmTrust/"/> </foaf:OnlineAccount> </foaf:holdsAccount> http://trust.mindswap.org/cgi-bin/FilmTrust/foaf.cgi?user=jgolbeck <swivt:Subject rdf:about="http://tw.rpi.edu/wiki/Special:URIResolver/Jennifer_Golbeck"> <rdfs:label>Jennifer Golbeck</rdfs:label> <swivt:page rdf:resource="http://tw.rpi.edu/wiki/Jennifer_Golbeck"/> <rdfs:isDefinedBy rdf:resource="http://tw.rpi.edu/wiki/Special:ExportRDF/Jennifer_Golbeck"/> <rdf:type rdf:resource="http://tw.rpi.edu/wiki/Special:URIResolver/Category-3AAssistant_Professor"/> <rdf:type rdf:resource="http://tw.rpi.edu/wiki/Special:URIResolver/Category-3APerson"/> <property:Foaf-3Adepiction rdf:resource="http://tw.rpi.edu/wiki/Special:URIResolver/Anonymous.png"/> <foaf:firstName rdf:datatype="http://www.w3.org/2001/XMLSchema#string">Jennifer</foaf:firstName> <foaf:interest rdf:resource="http://tw.rpi.edu/wiki/Special:URIResolver/Category-3ASemantic_Web_Topic"/> <foaf:name rdf:datatype="http://www.w3.org/2001/XMLSchema#string">Jennifer Golbeck</foaf:name> <foaf:surname rdf:datatype="http://www.w3.org/2001/XMLSchema#string">Golbeck</foaf:surname> http://tw.rpi.edu/wiki/Special:ExportRDF/Jennifer_Golbeck

  9. Example 5 cont. Which profile is most recent/relevant? <rdf:RDF> <foaf:Person> <foaf:name>Jennifer Golbeck</foaf:name> <foaf:mbox rdf:resource="mailto:golbeck@cs.umd.edu"/> <foaf:mbox rdf:resource="mailto:golbeck@mindswap.org"/> <owl:sameAs rdf:resource="http://www.mindswap.org/2004/owl/mindswappers#Jennifer.Golbeck"/> <foaf:workplaceHomepage rdf:resource="http://www.cs.umd.edu/~golbeck"/> <foaf:currentProject rdf:resoruce="http://trust.mindswap.org"/> <foaf:publications rdf:resource="http://www.mindswap.org/papers"/> <foaf:knows rdf:resource="#danbri"/> <rdfs:seeAlso rdf:resource="http://trust.mindswap.org/cgi-bin/getList.cgi"/> http://www.cs.umd.edu/~golbeck/daml/golbeckFOAF.rdf <ms:Researcher rdf:ID="Jennifer.Golbeck" rdfs:label="Jennifer Golbeck"> <rdfs:seeAlso rdf:resource="http://www.cs.umd.edu/~golbeck/daml/golbeckFOAF.rdf"/> <foaf:name>Jennifer Golbeck</foaf:name> <foaf:mbox><owl:Thing rdf:about="mailto:golbeck@cs.umd.edu"/></foaf:mbox> <foaf:homepage><foaf:Document rdf:about="http://www.cs.umd.edu/~golbeck/"/></foaf:homepage> <foaf:workInfoHomepage><foaf:Document rdf:about="http://www.mindswap.org/~golbeck/"/> </foaf:workInfoHomepage> </ms:Researcher> http://www.mindswap.org/2004/owl/mindswappers#Jennifer.Golbeck

  10. Our Contributions • Treating foaf smushing as entity co-reference • Use machine learning to train a classifier for recognizing co-referent foaf instance • Combine this with rule-based evidence • Use of narrower RDF properties to express co-reference, avoiding overuse of owl:sameAs • Use of a greedy algorithm for iteratively clustering co-referent entities and re-evaluating their potential co-reference relations introductionfoaf co-referenceapproachmethodologyevaluationconclusions

  11. Co-Reference in FOAF • Approach problem like cross-document co-reference resolution in text • Match pairs FOAF agents • Use rules and properties • Assign new properties to represent coref and notCoref relationships • Cluster co-referent pairs  introductionfoaf co-reference approach  methodology  evaluation  conclusions

  12. Cross-Document Co-reference Resolution • Determine when two documents mentionthe same entity • Are two documents that talk about “George Bush” talking about the same George Bush? • Is a document mentioning “Mahmoud Abbas” referring to the same person as one mentioning “Muhammed Abbas”? What about “Abu Abbas”? “Abu Mazen”? • Drawing appropriate inferences frommultiple documents demands cross-document co-reference resolution 2008 NIST Text Analysis Conference

  13. TAC KBP: Entity Linking Given an entity mention in an article, find the link to the right Wikipedia entity if one exists. John Williams Richard Kaufman goes a long way back with John Williams. Trained as a classical violinist, Californian Kaufman started doing session work in the Hollywood studios in the 1970s. One of his movies was Jaws, with Williams conducting his score in recording sessions in 1975... Michael Phelps Debbie Phelps, the mother of swimming star Michael Phelps, who won a record eight gold medals in Beijing, is the author of a new memoir, ... Michael Phelps is the scientist most often identified as the inventor of PET, a technique that permits the imaging of biological processes in the organ systems of living individuals. Phelps has ... 2009 NIST TAC Knowledge Base Population Track

  14. Smushing • Smushing is the traditional term used for recognizing that two “blank nodes” refer to the same thing and merging them • Past work on smushing has exploited IFPs (e.g., foaf:mbox), heuristic similarity metrics and custom SPARQL queries • owl:sameAs is often used to relate smushed nodes, enabling a reasoner to effect the merging • rdf:seeAlso used to find related foaf data introductionfoaf co-referenceapproachmethodologyevaluationconclusions

  15. Smushing foaf:Person rdfs:type foaf:nick owl:sameAs foaf:knows ”bar" foaf:mbox foaf:mbox "foo@gmail.com" introductionfoaf co-referenceapproachmethodologyevaluationconclusions

  16. Smushing foaf:Person rdfs:type foaf:nick foaf:knows ”bar" foaf:mbox "foo@gmail.com" introductionfoaf co-referenceapproachmethodologyevaluationconclusions

  17. owl:sameAs considered harmful • Known problems • Temporally qualified data (Ding vs. Ding) • Noisy data (Clinton vs. Clinton) • Referentially opaque contexts (John likes the Morning Star beautiful) • Halpin et. Al (2010) suggest a vocabulary for similarity relations similarity.owl • We use two weaker predicates: coref & notCoref • Defer the sameAs problem to applications introductionfoaf co-reference approach  methodology  evaluation  conclusions

  18. Co-Reference in FOAF • coref: transitive, symmetric and reflexive; has sameAs as subproperty • notCoref: symmetric and irreflexive but not transitive; has differentFrom as subproperty :coref a owl:TransitiveProperty, owl:SymmetricProperty, owl:ReflexiveProperty owl:sameAs rdfs:subPropertyOf :coref. :notCoref a owl:SymmetricProperty, owl:IrreflexiveProperty. owl:differentFrom rdfs:subPropertyOf :notCoref. {?a :notCoref ?b. ?b :coref ?c.} => {?a :notCoref ?c} {?a foaf:knows ?b.} => {?a :notCoref ?b} The :coref and :notCoref properties that we use instead of owl:sameAs introductionfoaf co-reference approach  methodology  evaluation  conclusions

  19. Batch Approach • Given a potentially large set of foaf instances • Generate candidate pairs • Evaluate each pair for co-reference • Using rules and classifier independently • Each results in a {coref, notCoref, unknown} decision • Trust rules over classifier • Designate pairs as co-referent • Create Clusters introductionfoaf co-referenceapproach methodology  evaluation  conclusions

  20. Ingest • Extract triples from FOAF profiles • Add each foaf agent as new entity in database • Entity URLs followed in foaf:knows graph to get additional information introductionfoaf co-referenceapproach methodology  evaluation  conclusions

  21. Approach: System Architecture introductionfoaf co-referenceapproach methodology  evaluation  conclusions Abstract entitygeneration ingestion Potential pairs: reduces classifier workload candidate pair generation Model Generation rule-based reasoning machine learning clusters formnew abstract entities deductive decisions predictions Co-referent designation and clustering

  22. Candidate Pairs • Filter pairs reduce matching set • Use simple string matching predicates • Dice score for 3-grams • Apply both to values of common properties and also cross-property values • Experiment 2 ~30% reduction • Reductions vary based on data set introductionfoaf co-referenceapproach methodology  evaluation  conclusions

  23. Input data sources • FOAF profiles extracted from Swoogle • Also used URLS extracted from tests conducted in previous work Distribution of URLs from Experiment 2 introductionfoaf co-referenceapproachmethodology evaluation  conclusions

  24. Methodology: Rule-based Model • Rules conclude that two instances are co-referent, not co-referent or draw no conclusion (the most common outcome) • Basic co-reference rule: • {?p a owl:IFP. ?a ?p ?x. ?b ?p ?x) => {?a :coref ?b} • {?p a owl:FP . ?a ?p ?x. ?a ?p ?y.) => { ?x :coref ?y} introductionfoaf co-referenceapproachmethodology evaluation  conclusions

  25. Methodology: Rule-based Model • In text processing, very similar name mentions in a document more likely to be co-referent • It also is used in disambiguating name men-tions in citations in a single paper or Web page • A similar heuristic is useful for a “knows graph” extracted from a single foaf profile • {?a foaf:knows ?b. ?a foaf:knows ?c. ?b neq ?c} => {?b :notCoref ?c} introductionfoaf co-referenceapproachmethodology evaluation  conclusions

  26. Methodology – Vector Model • Support Vector Machine linear kernel • Features: • Match/nomatch of any IFPs • Distance measures over common property values (Levenshtein & 3-gram Dice score) • Alias and entity mention resolution • Property specific feature comparison • Knows graph comparisons: Jaccard coef of similarity of foaf names of one-hop neighbors introductionfoaf co-referenceapproachmethodology evaluation  conclusions

  27. Methodology: Clustering • Pairs form clusters • Clusters used as part of system evaluation • Can result in: • Entity to Entity pairing • Cluster to Entity pairing • Cluster to Cluster pairing • Greedy process with a confidence threshold • Use rule-based model to eliminate known non-coreferent pairs introductionfoaf co-referenceapproachmethodology evaluation  conclusions

  28. Methodology – Clustering Instance matching can result in new cluster formation and cluster matching can result in merged clusters. introductionfoaf co-referenceapproachmethodology evaluation  conclusions

  29. Evaluation • Two experiments • E1: 50,000 triples, over 500 entity mentions, 600 classes used for training • E2: 250,000 triples, over 3500 entity mentions, over 1800 classes for training • 10-fold cross-validation tests introductionfoaf co-referenceapproachmethodologyevaluation conclusions

  30. Evaluation • For E1: 900 pairs non-match, majority undetermined • E2: Results shown below introductionfoaf co-referenceapproachmethodologyevaluation conclusions

  31. Evaluation • Results promising • During our E2 clustering phase, the first phase 90% accuracy • Second phase no new relationships among pairs, cluster to cluster pairing occurred Classification Results using 10-fold Validation introductionfoaf co-referenceapproachmethodologyevaluation conclusions

  32. Evaluation • Retrieving additional FOAF profiles based on knows graph • Quickly retrieve large number of entities • Tightly linked • reduced diversity of analyzed data • more entities that are co-referent • Future experiments: a diversity filter spanning domains introductionfoaf co-referenceapproachmethodologyevaluation conclusions

  33. Future Work • Evaluating the contribution of each rule and SVM feature to performance • Other ML approaches, e.g., markov logic, EM • Exploiting better clustering algorithms • Adding more features, e.g. non-foaf vocabu-lary, non-RDF data (e.g., hosting site) • Applying approach to other RDF instances • Scalability: • Providing a non-batch, streaming service • Offering a coref Web service introductionfoaf co-referenceapproachmethodologyevaluationconclusions

  34. Conclusions • We can treat instance linking as co-reference resolution & exploit in-doc and xdoc distinction • Good results with an ensemble approach combining rules and an SVM classifier • Apply clustering to form groups of co-referent relations and reprocess • Promising initial results introductionfoaf co-referenceapproachmethodologyevaluationconclusions

  35. http://ebiquity.umbc.edu/

More Related