150 likes | 253 Views
Visible Symbiosis: Leveraging the Purdue cyberinfrastructure for studying large scale knowledge communities . Sorin Adam Matei – Communication David Braun – ITaP Research/Envision Seungyoon Lee - Communication Lorraine Kisselburgh – Communication Brian Britt – Communication.
E N D
Visible Symbiosis:Leveraging the Purdue cyberinfrastructure for studying large scale knowledge communities Sorin Adam Matei – Communication David Braun – ITaP Research/Envision Seungyoon Lee - Communication Lorraine Kisselburgh – Communication Brian Britt – Communication
A research opportunity for computational social science projects • Wikipedia dataset • 2001 – 2008 editorial histories • 17 mil articles and over 280 mil edits • Over 20 mil unique editors • Wikipedia can help us study: • Knowledge emergence and creation • Collaboration dynamics • The genomics of social media
Decoding the Social Media Genome SORIN ADAM MATEI– COM David Braun - ENVISION Seungyoon Lee - COM Lorraine Kisselburgh - COM Brian Britt – COM Collaborators March Smith (Connected Action / Microsoft Research / NodeXL) HoriaPetrache, Physics, IUPUI WikiTrust, UC Santa Cruz
Parsimony? • Can the bewildering complexity of social media datasets be described parsimoniously? Editorial network for the Neutral Point of view Talk Page (2001-2008)
Representing Wikipedia as a network • Nodes: editors (contributors) • Edges (links): co-editorial contributions • If two editors contributed to the same article, they have a linkage • Gravitational model – more words, stronger links, the longer the time between interventions the weaker the links • Expected size • Billions of edges • >1 TB of data
Why the “social media genome”? • Does the network have a number of “bases,” (ie, subnetwork structures), similar to DNA? • Do these bases recombine to create “genes”, (ie, functional structures)? • Are genes organized into larger aggregates? • How can you tell where a “gene” starts and one ends? • How can we tell what functions specific “genes” have?
Research agenda • Can the complexity of a gigantic network be explained by a relatively simple “network alphabet”? (DNA-like “structure,” “chromosomes”) • What are the smallest structural units of this network? (“bases” similar to ATGC) • Are they limited to a limited array of topologies? • Are these topologies associated with specific roles? • Do the roles/topologies combine and recombine at various scales to create more complex structures? (“genes”)
Using entropy to measure complexity of the social media genome • As networks get more complex, do they decrease the entropy of the social system? • Entropy: system evenness/ diversity/complexity • If yes, by how much? • What is the overtime evolution of complexity/entropy of the social network described by Wikipedia?
Wikipedia has become more top heavy • As Wikipedia has added more and more users (13 mil, at the last count), its core group (<100,000) accounts (percentage-wise) for most of the total work • Wikipedia is not an expression of wise crowds, but of a clear and well oiled ad-hoc bureaucracy – ad-hocracy
ln(x) Intervention entropy Intervention number (events) As articles are edited more, the proportion of people who account for most of the work decreases Orange line – entropy (uneveness) of actual contributions Dotted line – wisdom of crowds ceiling, all contributions would be equal
The emergence of knowledge in large scale networks Seungyoon Lee * Lorraine Kisselburgh * Sorin Adam Matei Department of Communication
Research agenda • The dynamic processes of knowledge production as a large-scale collaborative effort are embedded within a social community • communicative relationships influence knowledge creation • Questions: • Is knowledge socially embedded? • How do knowledge networks emerge alongside social networks?
Co-evolution of networks • Using network theory and large-scale data to extend understanding of how knowledge is created and produced by “crowds” • Few opportunities to analyze the processes of knowledge creation by large groups over long periods of time • Little research available to understand how social relationships influence the process of knowledge production
Co-evolution of networks Knowledge networks Social/communication ntwks • Relationships b/w articles • Semantic ties among knowledge content and concepts • Relationships b/w contributors • Social ties among people Sample visualization What are the patterns of emerging relationships, and how are they embedded within each other?
Implications • Network science enables us to analyze patterns of behavior in human, biological, and technological systems • The structures of knowledge collaboration • What factors influence how knowledge is produced in collaborative networks • Cyberinfrastructures provide support for processing and visualization of large scale data, allowing us to analyze both massive and longitudinal data systems to understand and model human behavior