180 likes | 332 Views
Automated Social Hierarchy Detection through Email Network Analysis. (SNAKDD07) Ryan Rowe, Germ´an Creamer, Shlomo Hershkop, Salvatore J Stolfo. Advisor: Dr. Koh Jia-Ling Reporter: Che-Wei, Liang Date: 2008/12/11. Outline. Introduction SNA algorithm Results and Discussion
E N D
Automated Social Hierarchy Detection through Email Network Analysis (SNAKDD07) Ryan Rowe, Germ´an Creamer, Shlomo Hershkop, Salvatore J Stolfo Advisor: Dr. Koh Jia-Ling Reporter: Che-Wei, Liang Date: 2008/12/11
Outline • Introduction • SNA algorithm • Results and Discussion • Conclusions and Future Work
Introduction • The recent bankruptcy scandals in US companies such as Enron and WorldCom have increased the need to analyze electronic information • In order to define risk and identify any conflict of interest among the entities of a corporate household • Identifying the relationships between entities, or corporate hierarchy is not a straightforward task • Can be extracted by analyzing the email communication data
SNA Algorithm • For each mail user • Analyze and calculate several statistics for each feature of each user • Construct an email network graph • Vertices represent accounts, edges represent communication between two accounts • Analysis cliques and other graph theoretical qualities • CombinedtoSocialscore
SNA Algorithm • Two sets of statistics about user’s “importance” • Average response time • The average time elapsed between a user sending an email and later receiving an email from that same user • Considered a “response” if a received mail succeeds a sent mail within three days • Cliques(maximal complete subgraphs) • find all cliquesinagraph • Assumptions: users associated with a larger set and frequency of cliques will be ranked higher
Communication Networks • Numberofcliques • Thenumberofcliquesthattheaccountiscontainedwithin • Rawcliquescore • Ascorecomputedusingthesizeofcliqueset • Weightedcliquescore • Ascorecomputedusingthe“importance”ofthepeopleineachclique
Communication Networks • Degreecentrality • Deg(vi)= ∑ jaij(aij entryofadjacentmatrixAofG) • Clusteringcoefficient • how close the vertex and its neighborsare to being a clique
Communication Networks • MeanofshortestpathlengthfromaspecificvertextoallverticesinthegraphG wheredijD,DisthegeodesicdistancematrixofG • Betweenesscentrality • Proportionofallgeodesicdistancesofallothervertexthatincludevertexvi
Communication Networks • “Hubs-and-authorities”importance • Calculatesthe“hubs-and-authorities”importanceofeachvertex • J. Kleinberg. Authoritative sources in a hyperlinkedenvironment. Journal of the ACM, 46, 1999.
Social Score • Social score • Rank users from most important to least important • Group users which have similar social scores and clique connectivity • Determine n different levels of social hierarchy within which to place all the users
Compute Social Score • Scale and normalize each statistics • Social score • A score between 0 and 100
Results and Discussion • Using EMT • Java based email analysis engine built on a database back-end • JUNG library is used for the degree and centrality measures • Present the analysis of the North American West Power Traders division of Enron Corporation
Conclusions and Future Work • Enron dataset provides an excellent starting point of real world data • By varying the feature weights, it is possible to • Pick out the most important individual • Group individuals with similar social qualities • Graphically draw an organization chart which approximately simulates the real social hierarchy
Conclusions and Future Work • The concept of average response time can be reworked by considering the order of response • Consider common email usage times for each user and to adjust the received time of email • New grouping and division algorithms are being considered • Graph edges should be considered into arrange users into different level