610 likes | 740 Views
Networks, Maps, Relations. (Humanities Hackathon 2012, Day 4). Objects of study : novels, species, philosophers, philosophies, words, concepts, languages, songs…. The problem at hand : describe relationships between the objects. (similarity, influence, equivalence, co-location….).
E N D
Networks, Maps, Relations (Humanities Hackathon 2012, Day 4)
Objects of study: novels, species, philosophers, philosophies, words, concepts, languages, songs…. The problem at hand: describe relationships between the objects. (similarity, influence, equivalence, co-location….)
Graphs • Simplest case: relations between pairs of objects. • BINARY: objects are either related or they’re not (no attempt to measure extent or other qualities)
(D.P. Hayes, Social Network Theory and the Claim that Shakespeare of Stratford…)
How I made this graph (not recommended) • adj <- array(c(0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,1,0,0,0,1,1,0,0,0,0,0,0,0,0,0,1,1,0,1,1,1,0,0,0,0,0,0,1,0,0,0,1,0,1,1,0,1,0,0,1,1,0,0,0,0,1,1,1,0,0,1,1,0,0,1,1,0,0,0,0,0,0,0,0,0,0,1,1,0,0,1,0,1,1,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,1,0,0,1,0,0,0,0,0,0,1,1,0,1,0,0,0,1,1,0,1,0,0,1,1,0,0,0,0,0,1,0,0,0,0,0,1,0,0,1,1,1,1,1,1,0,1,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,1,0,0,0,1,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,0,0,1,0,0,0,0,0,0,1,0,0,1,1,0,0,1,1,1,0,0,1,0,0,0,0,0,0,1,0,0,0,1,1,0,0,0,0,0,0,1,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,1,0,0,0,1,0,0,1,0,0,0,1,1,0,0,1,0,0,0,0,0,1,1,0,0,0,0,0,0,0,1,1,0,1,1,1,0,0,1,0,0,0,1,1,0,1,0,0),c(20,20)) • >PL = graph.adjacency(adj,mode="undirected")
How I made this graph >Names = c( "Beaumont”, "Chapman" "Chettle" , "Dekker”, "Drayton" "Fletcher" , "Greene" , "Heywood" "Jonson" , "Kyd” ,"Lodge” ,"Lyly" "Marlowe" , "Marston" , "Middleton" "Munday" , "Nashe" , "Peele" "Webster" , "SHAKESPEARE”) > V(PL)$name = Names OR > V(PL)$name <- Names
Graphs A graph (or network) consists of: • A set of vertices (or nodes) • A set of edges of the form (v,w) where v and w are vertices. • Two vertices are adjacent if they are joined by an edge.
Directed graphs Undirected graphs model symmetric relations: A is connected to B means B is connected to A. (similarity, overlap, blood relation…) Directed graphs (or digraphs) model non-symmetric relations: (biological descent, Internet links, phone calls…)
Weighted graphs In a weighted graph, edges are assigned numbers – typically measuring the strength of a relation, not just whether it is there or not. (e.g. edge from v to w records number of e-mails from v to w, not just existence of e-mail from v to w.)
Shakespeare graph (undirected): • Vertices are Elizabethan playwrights • Edges are collaborations (or friendships, or co-defendancies)
MORAL: A picture of a graph is not a graph. The graph is the list of adjacencies, nothing more.
ASIDE: why do this? Oversimplification, BUT All statements about books are oversimplifications, e.g. “Raymond Carver wrote Cathedral” Our goal is “distant reading”
Basic notions • The degree (or valence) of a vertex is the number of edges attached to it. Loose measure of “importance” > degree(PL) Beaumont Chapman Chettle Dekker Drayton Fletcher 2 5 7 10 5 5 … Webster SHAKESPEARE 4 9
For directed graphs, the in-degree of a vertex x is the number of edges pointing to x, and the out-degree is the number of edges emanating from x. • Web graph: in-degree = number of links pointing to my page, out-degree = number of outbound links on my page
Basic notions • The distance between two vertices is the length of the shortest chain of adjacencies connecting them. • > shortest.paths(PL,"SHAKESPEARE","Lyly") • Lyly • SHAKESPEARE 3 • > lapply(get.shortest.paths(PL,'SHAKESPEARE','Lyly'),function(x) V(PL)$name[x]) • [[1]] • [1] "SHAKESPEARE" "Greene" "Nashe" "Lyly" (sorry for this ugliness)
Basic notions • The diameter of a graph is the greatest distance between any two vertices. • > diameter(PL) • [1] 5 • > farthest.nodes(PL) • [1] 1 12 5 • > shortest.paths(PL,1,12) • Lyly • Beaumont 5
Complete graphs • Every vertex adjacent to every other 5 vertices 10 edges
Complete graphs More generally: n vertices, each vertex connected to n-1 others for a total of n(n-1) This counts each edge twice! So (n^2-n)/2 edges. Number of edges scales as number of vertices squared: studying a graph on 10 times as many vertices can take 100 times as long. (Or more, depending on the question asked…)
Trees A tree is a graph in which every two vertices are joined by one, but only one, path. Equivalently: no cycles.
Communities • A clique is a set of vertices which are all mutually adjacent. (So: any pair of adjacent vertices is a clique of size 2, any “triangle” is a clique of size 3…) • e.g Shakespeare, Dekker, Chettle. • > largest.cliques(PL) • [[1]] • [1] 4 3 16 8 20 (Dekker,Chettle,Munday,Heywood,Shakespeare)
Communities A graph is connected if any vertex can be reached from any other by a chain of adjacencies. Every graph breaks up into connected pieces called connected components.
A geometry of their own “Really, universally, relations stop nowhere, and the exquisite problem of the artist is eternally but to draw, by a geometry of his own, the circle within which they shall happily appear to do so.” (Henry James, preface to Roderick Hudson) How to draw this circle?
Clustering Connected component: a set of vertices which has no connection to the remainder of the graph. Cluster: a set of vertices which has relatively few connections to the rest of the graph. (Note that this isn’t a definition…) Many ways to cluster, no “right way”
Clustering in R • > edge.betweenness.community(PL) • Graph community structure calculated with the edge betweenness algorithm • Number of communities (best split): 2 • Modularity (best split): 0.2781065 • Membership vector: • Membership vector: • Beaumont Chapman Chettle Dekker Drayton Fletcher • 1 1 1 1 1 1 • Greene Heywood Jonson Kyd Lodge Lyly • 2 1 1 2 2 2 • Marlowe Marston Middleton MundayNashe Peele • 2 1 1 1 2 2 • Webster SHAKESPEARE • 1 1
“The University Wits were a group of late 16th century English playwrights who were educated at the universities (Oxford or Cambridge) and who became playwrights and popular secular writers. Prominent members of this group were Christopher Marlowe, Robert Greene, and Thomas Nashe from Cambridge, and John Lyly, Thomas Lodge, George Peele from Oxford.” (Wikipedia)
Clusters of characters in Macbeth > edge.betweenness.community(Macbeth) Graph community structure calculated with the edge betweenness algorithm Number of communities (best split): 10 Modularity (best split): 0.06733369 Membership vector: MACBETH LADY MACBETH MACDUFF MALCOLM 1 2 1 1 ROSS BANQUO First Witch LENNOX 1 3 4 1 First Murderer DUNCAN Second Witch Third Witch 2 5 4 4 ALL SIWARD Messenger Second Murderer 1 6 7 8 Servant SEYTON 9 10
Breakpoint When can networks tell us things we don’t already know?
200 names Vertices: 200 baby names for boys popular in 2011. For each name, record popularity in WI, TX, PA, CA, MA, GA, OH, MO, FL, CO, NY, IL Edges: Two names are adjacent if their popularity distribution across states are “very similar”
200 names • >lapply(largest.cliques(MaleNames), function(x) V(MaleNames)$name[ x ]) [[1]] [1] "Jacob" "Anthony" "Dylan" "Matthew" "Brian" (popular in NY,CA,MA, less so in CO,MO,GA)
200 names • > V(MaleNames)$name[neighbors(MaleNames,'Malachi')] • [1] "Ashton" "Ashton" "Kaden" "Kaden" "Malachi" "Malachi" • > V(MaleNames)$name[neighbors(MaleNames,'Owen')] • [1] "Maxwell" "Maxwell" "Brady" "Brady" "Cole" "Cole" "Owen" "Owen" • V(MaleNames)$name[neighbors(MaleNames,'Patrick')] • [1] "Thomas" "Thomas" "Patrick" "Patrick" "John" "John" "Sean" "Sean" "Ryan" "Ryan" "Peter" "Peter"
edge.betweenness.communities finds groups of girls’ names like • Alaina, Maci, Mackenzie, Lillian, Addison, Alivia • Piper, Harper, Brooklyn, Brooklynn • Aubrey, Zoey, Autumn, Ellie • Lucy, Josephine, Elise, Clara, Eleanor
Density How likely are two things to be related? The density of a graph is the probability that two random elements are related: i.e. [total number of edges]/[total number of pairs of vertices] >graph.density(MaleNames) [1] 0.1084846 > graph.density(FemaleNames) [1] 0.09950159 >graph.density(Macbeth) [1] 0.2810458
Transitivity • A relation is transitive if “A related to B” and “B related to C” implies “A related to C.” Transitive: “Is descended from,” “born in same city as” Non-transitive: “is friends with”, “lived at some point in same city as”
How transitive is a graph? Some relations are transitive, others are not. But we don’t have to stop at “yes” or “no”. How frequently are two friends of yours friends with each other? • Always • Never • Something in between
How transitive is a graph? Transitivity (or “clustering coefficient”) gives the probability that two random neighbors of the same vertex are neighbors to each other. > transitivity(MaleNames) [1] 0.4972335 > transitivity(FemaleNames) [1] 0.4546713 > transitivity(Macbeth) [1] 0.4545455
How transitive is a graph? In both name cases, two random neighbors have about a 50% chance of being connected (while two random vertices have about a 10% chance of being connected.) Quite transitive! Facebook thinks the same is true for “friends” (and makes this so by thinking so!)