600 likes | 1.04k Views
Connecting Distributed People and Information on the Web. Jennifer Golbeck College of Information Studies Human-Computer Interaction Lab University of Maryland, College Park. Information Access on the Web. Find an mp3 of a song that was on the Billboard Top Ten that features a cowbell. .
E N D
Connecting Distributed People and Information on the Web Jennifer Golbeck College of Information Studies Human-Computer Interaction Lab University of Maryland, College Park
Information Access on the Web • Find an mp3 of a song that was on the Billboard Top Ten that features a cowbell. The Cowbell Project - http://www.geekspeakweekly.com/cowbell/
Finding Trusted Information • How many cows in Texas? http://www.cowabduction.com/
The Social Solution • People are the sources of information • Social relationships give us information about people • Use relationships to understand the information people produce.
Current State • 250-ish social networks • 850,000,000 users • Ning claims 185,000 networks
My Research Questions • How do users behave and relate to one another in web-based social networks? • How do social connections, like trust, relate to information? • How can we estimate relationships (like trust) between people who do not know each other? • How can we use social networks to build intelligent systems to improve information access?
Social Relationships and Information How Trust Relates to Similarity
A Study • People create information on the web • An expression of their opinions and view of the world • Focus on quantitative information (e.g. ratings) • People express trust in social networks • How does trust relate to the similarity of two people
The Idea • We know trust correlates with overall similarity (Ziegler and Golbeck, 2006) • Does trust capture more than just overall agreement? • Two Part Analysis • Controlled study to find profile similarity measures that relate to trust • Verification through application in a live system
Experimental Outline • Phase 1: Rate Movies - Subjects rate movies on the list • Ratings grouped as extreme (1,2,9,10) or far from average (≥4 different) • Create profiles of hypothetical users • Profile is a list of movies and the hypothetical user’s ratings of them • Subjects rate how much they would trust the person represented by the profile • Vary the profile’s ratings in a controlled way
Phase 1: Rating Movies • Movies most subjects would have seen - (100 worldwide top grossing films of all time) • Cover a broad spectrum of genres - • Top 10 rated movies from each genre as listed in the Internet Movie Database (IMDB): Action, Adventure, Animation, Family, Comedy, Crime, Documentary, Drama, Fantasy, Film-Noir, Horror, Independent, Musical, Mystery, Romance, Science Fiction, Thriller, War, and Western. • Include bad movies -(IMDB 100 worst rated movies with at least 1,000 ratings) • 283 total films • Ratings on 1 (bad) to 10 (great) scale
Generating Profiles • Each profile contained exactly 10 movies, 4 from an experimental category and 6 from its complement • E.g. 4 movies with extreme ratings and 6 with non-extreme ratings • Control for average difference, standard deviation, etc. so we could see how differences on specific categories of films affected trust
Subjects • 59 subjects • Age 20 to 52 • Education • 6 high school, 11 bachelors, 23 masters, 11 PhD, 8 unreported • Movie Experience • Watch 1-2 times per week on average • Movie media (web, magazines, etc.) every week or two
Results • Reconfirmed that trust strongly correlates with overall similarity (). • Agreement on extremes () • Largest single difference (r) • Subject’s propensity to trust ()
Validation • Gather all pairs of FilmTrust users who have a known trust relationship and share movies in common • 322 total user pairs • Develop a formula using the experimental parameters to estimate trust • Compute accuracy by comparing computed trust value with known value
In FilmTrust Use weights (w1,w2, w3, w4, w) = (7,2,1,8,2)
Experimental Conclusions • Social trust relationships are stronger between people who are similar in certain ways • First observed in controlled experiments • Verified through application in a real system
Applications Using social trust for improved information access
Social Information Access • Use social relationships (e.g. trust) for • Aggregating Information • Sorting and Ranking Information • Filtering and Assessing the Quality of Information • FilmTrust
FilmTrust • Use Trust for information access • Recommender system • Review ordering • 1200 users
Information Aggregation Using Trust • Trust-based Recommender System • Generates predictive movie ratings based on trust • Weighted average of everyone’s ratings of the film,where trust is the weight
Difference between known user rating and recommended rating (measured in number of stars difference) Minimum difference between known user rating and average rating
Conclusions - Social Information Access • Use understanding and analysis of social behavior in web-based social networks to improve information access • Shown a connection between social trust and similarity • Shown how trust can be used for aggregating, sorting, and filtering information
Future Directions - General • Improved understanding of behavior in web-based social networks • How different types of social connections relate to information • How to improve information access using new social analyses
Future Directions - Specific • Ad hoc information and social networks for micro news • E.g. I have evacuated for natural disaster (earthquake, hurricane, flood). I want to know what’s going on at my house. • Distributed information (satellite photos, ground, video, photos, blog entries, local news reports, message board text) • Needs • Provenance - is this information unique, or is it all derived from the same source? • Trust - should I trust the source of this information?
Questions • Jennifer Golbeck • golbeck@cs.umd.edu • http://trust.mindswap.org
Generating Profiles • Pre-defined rating differences • Subjects rated 54 total profiles • Six categories • Three values • Three profiles in each -category combination
The Provenance Challenge • Researchers in many areas • Storage systems • Databases • Grid computing • Data mining • A challenge provides a standard for comparing approaches • Given a scientific workflow and nine challenge queries • Represent all data that we consider relevant about the history of each file • Answer as many queries as possible
FilmTrust Results • FilmTrust compared trust from the social network with overall similarity (via collaborative filtering algorithms) as a weight in recommender systems. • Trust outperformed overall similarity in some cases, suggesting that trust captures something more than overall similarity does
Ten Largest WBSNs • MySpace 150,000,000 • ChinaRen Xiaonei 60,000,000 • Adult Friend Finder 26,000,000 • Bebo 25,000,000 • Friendster 21,000,000 • Cyworld 21,000,000 • Tickle 20,000,000 • Black Planet 18,000,000 • Hi5 14,000,000 • LiveJournal 12,000,000
Example Queries • Find everything that caused a given Graphic to be as it is. • Find all invocations of procedure align_warp using a twelfth order nonlinear 1365 parameter that ran on a Monday. • Find all images where at least one of the input files had an entry global maximum=4095. • A user has annotated some images with a key-value pair center=UChicago. Find the outputs of align_warp where the inputs are annotated with center=UChicago.
Semantic Web Approach • Ontology represents information about the execution of services and the dependencies among files • Logical inferences connect objects to their ancestors • Role hierarchy separates direct lineage from ancestry • Semantics of transitive roles imply connections among files connected through ancestral relationships • Additional reasoning with Semantic Web Rules
Evaluation through Query Answering • SPARQL, a W3C standard, is used to formulate queries • We were easily able to answer all nine queries for the challenge (one of only 3 teams from 15 entries) • Have already completed the second phase of the challenge, importing data from other systems and applying our techniques
Definition A Web-based Social Network (WBSN) must meet these criteria: • Accessible over the web with a web browser • Users must explicitly state their relationship with other people qua stating a relationship • Must have explicit built-in support for users making these connections. • Relationships must be visible and browsable (back)
Why the Difference? • Ranges of disconnected members • Dogster and HAMSTERster have lowest rates • Ecademy • FilmTrust • Mobango and Worldshine • As the non-social networking purpose of the website becomes stronger, the number of friendless and outsiders increases (back)
Using Web-BasedSocial Networks (WBSNs) • If we are going to use social networks for information access we must understand… • How do users behave in social networks? • How do social relationships relate to information?
Implications • The trust we have in people can inform how we treat information provided by those people • This and other studies suggest trust will work well for aggregating, filtering and sorting information • Important when working on the web
Outline • Motivation • Understanding Relationships in Web-based Social Networks • Behavior • Trust • Using Social Relationships for Information Access • Conclusions and Future Directions
Understanding Social Behavior In Web-Based Social Networks
Behavior and Dynamics • Social networks are not static. • Relationships constantly change, are formed, and are dropped. • New people enter the network and others leave • Do people behave the same way in social networks on the Web?
Questions • How do these networks grow (and shrink)? • How are relationships added (and removed)? • What affects social disconnect? • What affects centrality?
Methodology • 24 month study • Automatically collected adjacency lists (everyone and who they know), join dates, and last active dates for all members. • December 2004 • December 2006 • For 7 networks, I collected adjacency lists every day for 7 weeks. • Who joined or left • What relationships were added or removed