1 / 17

Connecting Users across Social Media Sites: A Behavioral-Modeling Approach

Connecting Users across Social Media Sites: A Behavioral-Modeling Approach. Reza Zafarani and Huan Liu Data Mining and Machine Learning Laboratory (DMML) Arizona State University KDD 2013 – Chicago, Illinois. How hard can it be to identify an individual across sites?

geneva
Download Presentation

Connecting Users across Social Media Sites: A Behavioral-Modeling Approach

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Connecting Users across Social Media Sites:A Behavioral-Modeling Approach Reza Zafarani and Huan Liu Data Mining and Machine Learning Laboratory (DMML) Arizona State University KDD 2013 – Chicago, Illinois

  2. How hard can it be to identify an individual across sites? Privacy Experts Claim Advertisers Know a lot about People Can they stop showing you the same repetitive ads across sites?

  3. Huan Liu More information about individuals Many social media sites Partial Information Complementary Information Better User Profiles Connectivity is not available Consistency in Information Availability Can we connect individuals across sites?

  4. Can we verify that the information provided across sites belong to the same individual?

  5. Human behavior generates Information redundancy Information shared across sites provides a behavioral fingerprint • Behavioral Modeling • Minimum Information MOBIUS MOdelingBehavior for Identifying Users across Sites

  6. Identification Function Minimuminformation available on ALL sites: Usernames Prior Usernames ({jsmith, john.s}) Candidate Username (john.smith)

  7. Generates Captured Via Feature Set 1 Information Redundancy Behavior 1 Feature Set 2 Information Redundancy Behavior 2 Feature Set n Information Redundancy Behavior n Learning Framework Identification Function Data

  8. Time and Memory Limitation 59% of individuals use the same username

  9. Knowledge Limitation Identifying individuals by their vocabulary size Alphabet Size is correlated to language: शमंतकुमार -> Shamanth Kumar

  10. Typing Patterns QWER1234 AOEUISNTH QWERTY Keyboard Variants: AZERTY, QWERTZ DVORAK Keyboard Keyboard type impacts your usernames

  11. Habits - old habits die hard Adding Prefixes/Suffixes, Abbreviating, Swapping or Adding/Removing Characters Nametag and Gateman Usernames come from a language model

  12. Experiment Setup Previous Methods: • Zafarani and Liu, 2009 • Perito et al., 2011 Baselines: • Exact Username Match • Substring Match • Patterns in Letters Data: 200,000 instances (50% class balance) 414 Features

  13. MOBIUS Performance

  14. Choice of Learning Algorithm

  15. Diminishing Returns for Adding More Usernames

  16. Conclusions +Future Work Information shared across sites acts as a behavioral fingerprint Discover applications of connecting users across sites Human Behavior Results in Information Redundancy A methodology for connecting individuals across sites • A behavioral modeling approach • Uses minimum information across sites • Allows for integration of additional behaviors when required Incorporating features indigenous to specific sites

More Related