1 / 15

Email Alias Detection Using Social Network Analysis 

Email Alias Detection Using Social Network Analysis . Ralf Holzer, Bradley Malin, Latanya Sweeney LinkKDD 2005 Advisor: Dr. Koh Jia-Ling Reporter: Che-Wei, Liang Date: 2008/08/14. Outline. Introduction Alias Detection Method Data Representation Ranking Algorithms Experiment.

sun
Download Presentation

Email Alias Detection Using Social Network Analysis 

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Email Alias Detection Using Social Network Analysis  Ralf Holzer, Bradley Malin, Latanya Sweeney LinkKDD 2005 Advisor: Dr. Koh Jia-Ling Reporter: Che-Wei, Liang Date: 2008/08/14

  2. Outline • Introduction • Alias Detection Method • Data Representation • Ranking Algorithms • Experiment

  3. Introduction • Individuals use aliases for various communication purposes • Alias detection • Useful to both legitimate and illegitimate applications • Important to understand the extent to which the process can be automated

  4. Introduction • Aliases are listed on the same webpage can indicate there exists some form of relationship between them • Many people use several email addresses • This paper attempt to determine which email addresses correspond to the same entity

  5. Introduction • Email addresses, a type of alias, can be distilled from a large number of web pages • Such as class rosters, research papers, discussion boards • Email addresses provide a unique mapping from address to a specific entity

  6. Data Representation • Let S represent the set of sources Modeled as an undirected graph G = (I, E) • Ibe the set of unique email addresses • Cab = |eab| denote the number of sources associated with each edge connecting a and b

  7. Ranking Algorithms • Ranking method • Top-k list of possible aliases • Shortest path algorithm • Used geodesic distance to generate a ranking of nodes closest to a given originating node • Relationship strength is augmented with • Number of aliases on a source • Number of collocations of aliases

  8. Ranking Algorithms • Geodesic distance • Length of the shortest path from a to b • Potential aliases are ranked from lowest to highest geodesic distance • Multiple Collocation • Two aliases which collocate on more than one webpage signifies a stronger relationship

  9. Ranking Algorithms • Source Size • Strength between two aliases in inversely correlated with the number of aliases in a source • Combined • Integrates both of previous assumptions

  10. Experiment • Derived from CMU web pages • 1978 distinct email aliases • Data Set Statistics

  11. Experiment

  12. Experiment • Geodesic Alias Distances

  13. Experiment

  14. Experiment

  15. Experiment

More Related