170 likes | 306 Views
Discovering Important Nodes through Graph Entropy. Jitesh Shetty, Jafar Adibi [KDD’ 05] Advisor: Dr. Koh Jia-Ling Reporter: Che-Wei, Liang Date: 2008/09/18. Outline. Introduction Order In Networks Graph Entropy Experimental Result Conclusions. Introduction.
E N D
Discovering Important Nodes through Graph Entropy Jitesh Shetty, Jafar Adibi [KDD’ 05] Advisor: Dr. Koh Jia-Ling Reporter: Che-Wei, Liang Date: 2008/09/18
Outline • Introduction • Order In Networks • Graph Entropy • Experimental Result • Conclusions
Introduction • A new challenge in the area of Link Discovery and Social Network Analysis • To exploit communication pattern information and text information within knowledge discovery processes • such as discovery of hidden organizational structure and selection of interesting prominent members
Introduction • Email logs • Prime importance and relevance in the study of information flow in an organization • Evidence database for law enforcement and intelligence organizations to detect hidden groups in an organization which are engaged in illegal activities • Graph entropy • To determine the most prominent interesting people
Order In Networks • A graph model might not be the best representation of organizations • Such as drug dealers, terrorist organization, threat groups • Usually ignore their hierarchy • They are composed of leaders and followers
Order In Networks • Example
Graph Entropy (1/6) • To find prominent people in a network • Need to aggregate links between them and discover which node has the most effect on network • Entropy model can identify an entity that most effect on the graph entropy • Transform the problem space into a multigraph • Each node represents an entity, each link represents action between entities
Graph Entropy (3/6) • Let G = (V, E) be a graph. P is the probability distribution on the vertex set V(G) • P(AemailB) =
Graph Entropy (4/6) • A great concern in LD domain is that elements of data are not independent • Ex: link AsendemailtoB and link BsendemailtoC are dependent to each other, means B may forward A’s email to C • Three approach to discover dependency • Examine the similarity of emails • check
Graph Entropy (5/6) 3. Exploitation of Markov Blanket type of model • Assume an event(link) between two nodes is only dependent to those node’s events
Experiment • Enron Email Dataset • 151 users, mostly senior management of Enron • contains 252,759 email messages • Almost all users use folders to organize their emails
Experiment • Created an Enron dictionary • Normalized all emails using porter stemming algorithm • Compare the vectors using Jaccards Algorithm • Ordered emails based on the time stamp
Conclusions • Defined and addressed the problem of important nodes and finding closed group around them • Using event based entropy to find influential nodes in a graph and exhibit entropy model can act as a good means for detecting influential nodes