380 likes | 497 Views
Modelling Users’ Profiles and Interests based on Cross-Folksonomy Analysis. Martin Szomszor University of Southampton. Outline. Introduction and Motivation Why is your folksonomy interaction useful? How could it be exploited? Making Sense of Folksonomies Distributed Contact Networks
E N D
Modelling Users’ Profiles and Interests based on Cross-Folksonomy Analysis Martin Szomszor University of Southampton TAGora: Semiotic Dynamics of Online Social Communities EU-IST-2006-034721
Outline • Introduction and Motivation • Why is your folksonomy interaction useful? • How could it be exploited? • Making Sense of Folksonomies • Distributed Contact Networks • Tag Filtering / Tag Senses • Profiles of Interests • Future Work • Disambiguation • Building Better Profiles of Interests
Introduction http://news.bbc.co.uk/ http://slashdot.org/ Dream Theater Metallica Rush delicious.com
Increasing number ofonline identities • Recent Ofcom study found that UK adults have on average 1.6 profiles. 39% of those that have one profile have at least 2 • [Ofcom 2008] Social Networking: A quantative and qualitative research report into attitudes, behaviours, and use. • In the future, people will maintain an increasing number of online identities to meet different information sharing tasks and to connect with different communities
Tag Clouds delicious.com
The Big Picture Profile of Interests delicious.com
Personalisation Profiles could be exported to other sites to improve recommendation quality Profile of Interests Better user experience Profiles could be used to support personalised searching delicious.com
Consolidation and Integration cuba cuba hotels holiday travel 2008 currency http://dbpedia.org/resource/Cuba http://dbpedia.org/resource/Travel http://dbpedia.org/resource/Holiday http://dbpedia.org/resource/Category:Tourism
Tagging Variation Filtered Tags Raw Tags [1] Szomszor, M., Cantador, I. and Alani, H. (2008). Correlating User Profiles from Multiple Folksonomies. In: ACM Conference on Hypertext and Hypermedia, 2008 , Pittsburgh, Pennsylvania.
Disconnected Identities fan of friend #me contact friend
Making Sense of Folksonomies Tagging Semantics FOAF DBpedia + Wordnet Identity Integration Tag Integration Delicious Last.fm Flickr Facebook …
1. Contact Integration Tagging Semantics FOAF DBpedia + Wordnet Identity Integration Tag Integration Delicious Last.fm Flickr Facebook …
Consolidated Contact View • Recommend new connections #me
FOAF Representation of SNS Accounts http://tagora.ecs.soton.ac.uk/LiveSocialSemantics/ht2009/foaf/4 http://tagora.ecs.soton.ac.uk/facebook/613077109 <http://tagora.ecs.soton.ac.uk/facebook/613077109> <http://tagora.ecs.soton.ac.uk/schemas/facebook#hasFriend> <http://tagora.ecs.soton.ac.uk/facebook/1006466985>, <http://tagora.ecs.soton.ac.uk/facebook/684541156>, … <http://tagora.ecs.soton.ac.uk/facebook/1043367866>; <owl#sameAs> http://tagora.ecs.soton.ac.uk/delicious/martinszomszor http://tagora.ecs.soton.ac.uk/flickr/7214044@N08 http://tagora.ecs.soton.ac.uk/lastfm/mszomszor
2. Tag Integration Tagging Semantics FOAF DBpedia + Wordnet Identity Integration Tag Integration Delicious Last.fm Flickr Facebook …
Folksonomy IntegrationTag Heterogeneity Web2.0 Web_2.0 !=
Folksonomy Integration:Tag Heterogeneity isFilteredTo Web2.0 Web_2.0
Tag Filtering • Find canonical form for each tag: • Use Dbpedia entry labels as reference • compound terms separated by _ • second-life, second+life, second.life -> second_life • concatenated / camel case terms are expanded • secondlife, SecondLife -> second_life • International Characters Normalised: • Caf%C3%A9 -> Cafe • Recommend Spelling Corrections • resaerch -> didYouMean research • Follow unambiguous redirections: • Humor, Funny -> Humour
cooccurringTag isFilteredTo Tag xsd:string CooccurrencInfo hasCooccurrenceInfo rdfs:label xsd:integer UserTag xsd:integer hasCooccurrenceFrequency hasUserFrequency hasDomainTag tagUsed (f) DomainTag xsd:integer hasNextSegment (f) hasDomainFrequency hasGlobalTag TagSegment GlobalTag xsd:integer hasGlobalFrequency FinalTagSegment hasTagSequence (f) usesTag Resource hasPost Post taggedResource Tagger http://tagora.ecs.soton.ac.uk/schemas/tagging# http://www.w3.org/2001/XMLSchema# taggedOn property subclass xsd:datetime (f) = functional property
sparql$ select ?x where { ?x <http://tagora.ecs.soton.ac.uk/schemas/tagging#isFilteredTo> <http://tagora.ecs.soton.ac.uk/tag/web_2.0>} ┌─────────────────────────────────────────────┐ │ ?x │ ├─────────────────────────────────────────────┤ │ <http://tagora.ecs.soton.ac.uk/tag/web2.0> │ │ <http://tagora.ecs.soton.ac.uk/tag/web2> │ │ <http://tagora.ecs.soton.ac.uk/tag/web_2.0> │ │ <http://tagora.ecs.soton.ac.uk/tag/web_20> │ │ <http://tagora.ecs.soton.ac.uk/tag/web20> │ └─────────────────────────────────────────────┘ sparql$ select * where { ?x <http://tagora.ecs.soton.ac.uk/schemas/tagging#isFilteredTo> <http://tagora.ecs.soton.ac.uk/tag/second_life>} ┌───────────────────────────────────────────────────┐ │ ?x │ ├───────────────────────────────────────────────────┤ │ <http://tagora.ecs.soton.ac.uk/tag/second_Life> │ │ <http://tagora.ecs.soton.ac.uk/tag/second.life> │ │ <http://tagora.ecs.soton.ac.uk/tag/SecondLife> │ │ <http://tagora.ecs.soton.ac.uk/tag/Second_Life> │ │ <http://tagora.ecs.soton.ac.uk/tag/second%20life> │ │ <http://tagora.ecs.soton.ac.uk/tag/SECOND_LIFE> │ │ <http://tagora.ecs.soton.ac.uk/tag/second_life> │ │ <http://tagora.ecs.soton.ac.uk/tag/secondlife> │ └───────────────────────────────────────────────────┘ Finding Syntactic Variations
Tag Senses • What are the possible meanings for a tag? • We use two reference sets: • DBPedia • Concepts • Wordnet • Synsets
Disambiguation Ontology didYouMean hasWordnetSense Tag WordSense DbpediaSenseInfo hasDbpediaSenseInfo http://www.w3.org/2006/03/wn/wn20/schema/ senseWeight http://tagora.ecs.soton.ac.uk/schemas/disambiguation# http://tagora.ecs.soton.ac.uk/schemas/dbpedia# dbpediaSense http://tagora.ecs.soton.ac.uk/schemas/tagging# http://www.w3.org/2001/XMLSchema# property subclass xsd:float (f) = functional property Resource
DBpedia Extraction • Extract triples from XML dump • Calculate normalised title string • Caf%C3%A9 -> cafe • Calculate concatenated title string • Second_life -> secondlife • Extract disambiguation term from title • Orange_(fruit) • Identify compound labels • Second_Life -> Second, Life
DBpedia Extraction • Number of incoming links • Extract page redirects • Extract Disambiguation Links • Find Primary disambiguation (e.g. Apple)
DBpedia Extraction • Parse wiki text and extract terms: • Terms filtered using stop words (with some wiki specific additions) • Store term frequencies • Store number of distinct terms in page • Store total term frequency • Can associate a vector of terms and weights to each possible sense
hasNextLabelSequence (f) hasCompoundLabelSequence (f) CompoundLabelSequence hasPrimaryDisambiguation xsd:string isa hasDisambiguation hasCompoundLabel (f) FinalCompoundLabelSequence Resource hasLabel xsd:string xsd:string hasNormalisedLabel hasTermFrequencyPair hasConcatenatedLabel xsd:string xsd:string xsd:integer xsd:integer hasDisambiguationTerm TermFrequencyPair hasTotalTerms hasTotalTermFrequency hasTerm xsd:string hasTermFrequency xsd:integer
[2] Szomszor, M., Alani, H., Cantador, I., O'Hara, K. and Shadbolt, N. (2008) Semantic Modelling of User Interests based on Cross-Folksonomy Analysis. In: 7th International Semantic Web Conference (ISWC), October 26th - 30th, Karlsruhe, Germany. Profiles of Interests
Global Category View • What are the differences in the interests that are learnt from each domain?
Future Work • Given a set of possible senses, how can we choose the best match? • Folksonomy data can provide contextual information: • User tag-cloud • Cooccurrence Network • User Cooccurrence Network • Can abstract this information as a vector of terms and weights (context)
Building Better Profiles • What tags correspond to interests? • Locations and topics are useful, but other terms are not • TF / IDF Approach • It’s not that useful to find out we are all interested in HTML • Making use of the Category hierarchy • If I’m interested in Facebook, Flickr, Last.fm, Delicious, etc, I can extrapolate the interest Online_Social_Networks
http://tagora.ecs.soton.ac.uk/tag/apple dbpedia:hasDbpediaSenseInfo http://tagora.ecs.soton.ac.uk/tag/apple/sense-info/0 dbpedia:sense dbpedia:senseWeight 0.30628910807 http://tagora.ecs.soton.ac.uk/dbpedia/resource/Apple_Inc. owl:sameAs dbpedia:hasTermFrequency dbpedia:hasTermFrequencyPair “mac” _:b9510f00000000a5 35 dbpedia:hasTerm http://tagora.ecs.soton.ac.uk/tag/apple/sense-info/1 dbpedia:sense dbpedia:senseWeight 0.248912928 http://tagora.ecs.soton.ac.uk/dbpedia/resource/Apple owl:sameAs dbpedia:hasTermFrequency dbpedia:hasTermFrequencyPair “fruit” _:b9510f00000000a5 41 dbpedia:hasTerm