Kronecker Graphs

Kronecker Graphs

The Kronecker Graph Model (rmat) • Start with a parameter matrix A • For n vertices, take Kronecker products • Normalize the entries

Generating Edges • One Method • Calculate the whole Kronecker matrix • Sample each edge independently according to entry • Another Method • Treat parameters as probabilities • Flip coins for each edge

Features • Pro • Fast to generate: parallel and distributed • Few parameters to fit • Self-similarity • Con • Doesn’t have a powerlaw distribution • Parameters aren’t intuitive • May not be connected • Used in Graph500 benchmark [Seshadri, Kolda, Pinar]

Variance of Real Graphs [Moreno, Kirschner, Neville, Vishwanathan]

Web Search and Ranking

Web Search Information Retrieval: Given a query Hugh Laurie, find all documents that mention those words

Web Ranking Before 1998 • Use tf-idf(roughly) • Term frequency – inverse document frequency # of occurrences of in # of occurrences of in , the corpus

Results • It was bad • The best results for a topic may not mention the topic explicitly a lot

What are we missing? • Traditional IR only has the text to work with • We have an information network • The hyperlinks are created by intelligent, rational beings!

1998 – HITS (J. Kleinberg) • What if we ranked documents by in-links? The power law distribution on in-degree will get us every time.

HITS • Idea: Different pages and different links play different roles • Some pages are AUTHORITIES • Some pages are HUBS

Hubs • What is a good hub? A page is a good hub if it points to many authorities.

Authorities • What is a good authority? A page is a good authority if many hub pages point to it. How can we find good hubs and good authorities?

HITS • Everyone starts with a hub-score of 1 and authority-score of 1 • A-update: For each page p, auth(p) is the sum of the hub-scores of pages that point to p. • H-update: For each page p, hub(p) is the sum of the auth-scores of pages ppointsto.

Formally • M is the adjacency matrix, h the hub-scores and a the auth-scores How many iterations should we do? Calculated on the subgraph that corresponds to the query at hand

Where does HITS fail? • Assumes a bipartite clique structure to the web • Doesn’t allow more general forms of endorsement

PageRank – try 1 • Instead of h and a scores, just one score. PR-update(p) = sum of normalized PR score of each page that points to p

Where does this fail? Hint: The web graph is directed.

Actual PageRank • Make the graph strongly connected by adding epsilon weight links between all pages. • Let A be the normalized adjacency matrix

Calculating with the Power Method • Start with • Calculate • Add to every entry • Normalize and repeat Repeat this times

The Random Surfer Model • What natural process can justify PageRank? • How can we model how people might use the web?

The Random Surfer • Starts at some page on the web • With probability (1-), selects a random link on the page and follows it • With probability , gets bored and jumps to some new random web page.

The Random Surfer • The PageRank vector is the probability that you will visit each website in this process

Random Walks on Graphs 1/3 1 1/3 1/2

Stationary Distributions • What does this process converge to? • Connection between eigenvectors and stationary distributions. Why is the top eigenvalue always 1?

Mixing Time • How long does it take to converge? • Why does PageRank converge in time?

Undirected Graphs • The stationary distribution is proportional to the degree

Spectral Analysis for HITS …

Applications and Extensions

Personalized PageRank • What if the surfer didn’t jump randomly? • s can be any distribution over the pages

Uses of Personalized PageRank • Creating personalized search results • Topic-sensitive PageRank • Local community detection • Can you compute it more efficiently than PageRank?

The Intentional Surfer • Click data is collected by • Google/Bing Toolbar • Cookies from ad websites.. • Can use this to get better estimates for click through rates of each link • Modifies our transition probabilities to improve PageRank

Search Engine Optimization • Designing your page with the ranking function in mind • Co-evolves with search engines • Obvious Tricks • Make a collection of websites to point to you • Buy old webpages • Include text in background color font • Paying others to link to you

Link spam detection Spam The web graph

Connection to HITS • If you link to a lot of spam sites, you are probably also spam. (Hub) • If you are linked to by lots of spam sites, you are probably why that spam collection was built. (Authority) • Start with seed sites with Hub, Authority scores of 1.

Trust Propagation • Given some information (i trusts j) or (i does not trust j), how can we model trust in a network?

Types of Trust Propagation • Direct Propagation • Transpose Propagation • Co-citation • Trust Coupling i j k i j i j k m i j m

Distrust Propagation • Trust Only • 1-Step Distrust • Propagated Distrust

Propagating Trust and Distrust • Eigenvalue Propagation • Weighted Linear Combination How do you round this matrix to give trust/distrust?

Experiments • Epinions ‘web-of-trust’ • 841,372 edges labeled + or -. Try all combinations of trust and distrust propagation. What is the best model?

Project Proposals • Email by 9/26 to: isabelle@eecs.berkeley.eduanirban.dasgupta+cs294@gmail.com

Kronecker Graphs

Kronecker Graphs

Presentation Transcript

Graphs

Kronecker Products-based Regularized Image Interpolation Techniques

Graphs

Graphs

Large networks , clusters and Kronecker products

Kronecker Graphs: An Approach to Modeling Networks

Graphs

Graphs

Modeling Real Graphs using Kronecker Multiplication

Graphs

Graphs

Graphs

Graphs

Graphs, graphs, graphs

Graphs

Graphs

Graphs

Graphs