Micro-interactions and Macro-observations

Micro-interactions and Macro-observations Klaas Dellschaft

Example: Naming Game (I) • Micro-interactions … • Mother talking to her childhttp://www.youtube.com/watch?v=kiGduwJK6SQ • Macro-observations … • Child learns to speak

Example: Naming Game (II) ??? Kuh User2: • User roles: Speaker/ Hearer • Speaker: Speaks a word • Hearer: Tries to guess which object was meant • Successful round: Hearer makes a correct guess • Objective: Maximize the number of successful rounds • http://talking-heads.csl.sony.fr ??? Cow Kuh Kuh Cow User1: Cow ??? Kuh Kuh User 3: Kuh Cow

Example: Naming Game (III) • Micro-level interactions … • Speaker / hearer • Round successful? • Yes: Reinforce the used word • No: Learn new word • Macro-level observations … • Stable vocabulary emerges over time • For each object / attribute, only one word survives • Naming game explains how languages may emerge • Why are there many different languages on the world? • Naming game ignores geographic distribution of agents

Model-based research • Modeling micro-interactions • Define rules for interactions between agents • Use rules for simulating the dynamics in a system • Objective: Explain the emergence of macro-observations • Use cases: • Biology: Spreading of diseases in a population • Sociology: Emergence of different cultural habits • Web Science: • Spreading of memes / hashtags in Twitter • Emergence of a collaborative vocabulary in tagging systems • …

Basic Models (I) • Preferential Attachment (Polya Urn Model) • There are n balls with different colors in an urn • In each step: • Randomly draw a ball • Put it back together with a second ball of the same color • Fixed number of colors • Colors are distributed according to a power law

Basic Models (II) • Linear Preferential Attachment (Simon Model) • Like the Polya Urn Model. Additionally in each step: • Instead of drawing a ball, insert with low probability p a ball with a new color • Linear increasing number of colors • Colors are distributed according to a power law

Basic Models (III) • Information Cascades • Users decide rationally between alternatives • Example: Accept (A) / Reject (R) • Each user gets private information • When the correct decision is to accept, the user more likely gets the information to accept (i.e. P(A) > 0.5) • Each user sees the decision of the previous users • Rational choice: • Adopt the choice of the majority of previous users and private information • Choice only relies on decision of previous users, if the difference in votes between A and R increases beyond 2 • All subsequent users adopt the same choice  cascade • Not necessarily the correct decision is cascaded!!!

Method of Model-based Research Micro-interactions Macro-observations Unknown Model Observed Properties Reality Compare Unknown rules of interaction Stochastic Model Simulated Properties Model Assumed rules of interaction

Use Case: Spreading of Memes in Twitter (I) • Meme: Topic / idea that is discussed in Twitter • Observables: • Lifetime of tweets in Twitter (in hours) • Number of people contributing to a meme (per day) http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3315179/

Use Case: Spreading of Memes (II) • Assumed rules of interaction: • Each user can see memes posted by his friends • Each user remembers his own previously tweeted memes • When tweeting, a user either … • … invents a new meme, or … • … randomly selects a meme posted by his friends, or … • … randomly picks up one of his previously tweeted memes • Users only remember the last n tweets of their friends and/or of their own http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3315179/

Use Case: Spreading of Memes (III) • Comparing simulation and reality: • Empirical observations are better reproduced when assuming a social network between users • Structure of the friendship network influences meme spreading http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3315179/

Details of Model-based Research • How to represent observables? • Distribution functions • How to compare simulation and reality? • Analytical evaluation • Visual comparison • Goodness-of-fit tests • How to decide between competing models?

Use Case: Dynamics in Tagging Systems • Do the users agree on how to describe a resource? • How do users influence each other in tagging systems?

Folksonomies • Vertexes: Users, tags, resources • Hyperedges: Tag assignments (user X tag X resource) • Postings: • Tag assignments of a user to a single resource • Can be ordered according to their time-stamp

Co-occurrence Streams • Co-occurrence Streams: • All tags co-occurring with a given tag in a posting • Ordered by posting time • Example tag assignments for ‘ajax': • {mackz, r1, {ajax, javascript}, 13:25} • {klaasd, r2, {ajax, rss, web2.0}, 13:26} • {mackz, r2, {ajax, php, javascript}, 13:27} • Resulting co-occurrence stream: php javascript javascript rss web2.0 time

Co-occurrence Streams – Tag Frequencies Zipf Plot of the tag frequencies

Probability Distributions • Measuring the probability of a certain event • Examples: • Rolling a dice – How often do we get the 1, 2, 3, …? • Questionnaires – How often do people check the 1, 2, … on a scale from 1 to 10? • Tagging – How often is the tag ‘ajax’ used? • Tagging – How many of the used tags are used 1-time, 2-times, …? • Different types of measurement scales

Probability Distributions – Measurement Scales (I) Nominal scale Ordinal scale Interval scale Ratio scale Source: http://de.wikipedia.org/wiki/Skalenniveau

Probability Distributions – Measurement Scales (II) Nominal Scale Ordinal Scale /Interval Scale

Probability Distributions – Representations (I) • Probability Distribution Function (PDF): • P(X = x): Probability of observing an event x • Cumulative Distribution Function (CDF): • P(X  x): Probability of observing an event whose value is  x. Requires at least ordinal measurement scale. • Example: Normal distribution PDF CDF Source: http://en.wikipedia.org/wiki/Normal_distribution

Probability Distributions – Representations (II) • Zipf plot • Representation for distributions with nominal scale • Assign ranks to the different categories • Rank 1: Most often occurring category • x-axis: Categories ordered by their ranks • y-axis: Probability of category with rank x • Often used for representing word frequencies in texts • Zipfs law: • Describes the relation between the rank k and the frequency f(k) of a word in natural language texts

Co-occurrence Streams – Tag Frequencies Tag frequencies approx. follow Zipf’s law (straight line in Zipf plot with loga-rithmically scaled axes)

Comparing Reality and Model (I) • Visual comparison: • Visually plot the real observables and the simulated results • The closer together the plots, the better the model • Advantage: Easy to understand and to implement • Disadvantage: Highly subjective (i.e. not a scientific method)

Comparing Model and Reality (II) • Analytical evaluation: • Use mathematical methods for analyzing the model • Proof that the simulation results have certain properties • Example: Preferential attachment • Frequency distribution of colors is a power-law • Color frequencies tend to a random limit • Advantages: • Very deep understanding of the mechanisms • Mathematical dependencies between model parameters and properties of the simulation results • Disadvantages: • Analyzed models have to be “mathematically tractable” • Does not show that simulated properties can also be observed in reality

Comparing Model and Reality (III) • Goodness-of-fit tests: • First step: • Define objective measure of distance between simulated and observed property • Relative measure of goodness-of-fit • Applicable for any property • Second step: • Computer whether simulated and observed property are statistically indistinguishable • Absolute measure of goodness-of-fit • Only applicable for properties that can be represented as probability distributions

Kolmogorov-Smirnov Test (Example) • Goodness-of-fit test for distributions with at least ordinal measurement scale • Maximal distance between simulation and observation:

Details of Model-based Research  • How to represent observables? • Distribution functions • How to compare simulation and reality? • Analytical evaluation • Visual comparison • Goodness-of-fit tests • How to decide between competing models?  Friday!

Micro-interactions and Macro-observations