350 likes | 443 Views
A Graph-Based Approach to Link Prediction in Social Networks Using a Pareto-Optimal Genetic Algorithm. Jeff Naruchitparames University of Nevada, Reno - CSE CS 790: Complex Networks, Fall 2010. biological. social. 2. 3. 4. Social networks = Dynamic, judgmental environment
E N D
A Graph-Based Approach to Link Prediction in Social Networks Using a Pareto-Optimal Genetic Algorithm Jeff Naruchitparames University of Nevada, Reno - CSE CS 790: Complex Networks, Fall 2010
biological social 2
Social networks = • Dynamic, judgmental environment • Affect friendships over time heterogeneous very dynamic 5
1-2 hop distance only • Friend-of-friend 7
Multiple hops; >1 • Structural; purely graph-based • No explicit correlation between potential friends... 8
Silva, et. al., • A Graph-based Recommendation System Using Genetic Algorithms, 2010 9
Friends-of-Friends 2 hops Filter Order 12
Filtering “It’s more probable that you know a friend of your friend than any other random person” Mitchell M., Complex Systems: Network Thinking, 2006. 13
Indexes 16
What’s missing? • Heterogeneity • Human behavior and preferences • Multiple hops 17
My approach • Pretty much a filtering problem... 18
My approach • Components (for filtering) • Betweenness centrality • Community detection • Clique Percolation Method (CPM) • Friends of friends • 10-dimensional Pareto-optimal genetic algorithm 19
Remove duplicates • Remove our test cases • (More on this later...) 22
The Features • # of shared friends • location • age range • general interest • music • attended same events • groups • movies • education • religion/politics 25
Pareto Optimality • Localized to implementation of selection • Feature subset selection • We want to find the best combination of these subsets that can give us the best solutions for how we determine friendships 26
Pareto Optimality • Compare with the test cases we removed earlier... • For all chromosomes in population, do: • If ALL test cases ≥ optimal Pareto front • Calculate fitness • Good to go • Else • Calculate fitness • Continue onto next chromosome 29
Fitness Function • ∑ ∑ piln( fj)pi-1 n 10 i=1 j=1 30
Continuing on with the Evolutionary Process • Apply fitness proportional selection • Randomly select 2 parents to mate • Apply 1-point crossover (82% chance) • Bit mutation (0.05% chance) • Do this until ALL test cases better than Pareto front OR fitness does not improve for 5 consecutive generations 31
Conclusion • Complex network theory + Genetic algorithm + social theory • Betweenness centrality • Community detection • Clique Percolation Method • Binary 10-dimensional Pareto-optimal genetic algorithm • Dominant, fitness proportional selection • Several levels of filtering and selection (aka filtering ☺) 33
Future Work • Better fitness function (need to ask Sociologists) • Weighted chromosome for Pareto optimization (as opposed to binary) • Prove all this stuff actually works (sociology standpoint??) • Parallelize or GPU-ize the code (it’s in Python) 34