410 likes | 530 Views
Identifying Multi-ID Users in Open Forums. Hung-Ching Chen Mark Goldberg Malik Magdon-Ismail. Who is Using Multiple IDs?. Consider two chatrooms, “letters” and “numbers”. letters. numbers. dogg: hey anna, Z rocks! anna: hey dogg whats new mack: what’s that dogg? dogg: not much
E N D
Identifying Multi-ID Users in Open Forums Hung-Ching Chen Mark Goldberg Malik Magdon-Ismail
Who is Using Multiple IDs? Consider two chatrooms, “letters” and “numbers” letters numbers dogg: hey anna, Z rocks! anna: hey dogg whats new mack: what’s that dogg? dogg: not much dogg: Z is the best mack: i love 5 catt: i don’t joop: 27 rules! catt: i agree joop :)
Multi-ID Users letters numbers dogg: hey anna, Z rocks! anna mack dogg/catt joop
Multi-ID Users letters numbers dogg: hey anna, Z rocks! mack: i love 5 anna mack dogg/catt joop
Multi-ID Users letters numbers dogg: hey anna, Z rocks! anna: hey dogg whats new mack: i love 5 anna mack dogg/catt joop
Multi-ID Users letters numbers dogg: hey anna, Z rocks! anna: hey dogg whats new mack: what’s that dogg? mack: i love 5 anna mack dogg/catt joop
Multi-ID Users letters numbers dogg: hey anna, Z rocks! anna: hey dogg whats new mack: what’s that dogg? mack: i love 5 catt: i don’t anna mack dogg/catt joop
Multi-ID Users letters numbers dogg: hey anna, Z rocks! anna: hey dogg whats new mack: what’s that dogg? dogg: not much mack: i love 5 catt: i don’t anna mack dogg/catt joop
Multi-ID Users letters numbers dogg: hey anna, Z rocks! anna: hey dogg whats new mack: what’s that dogg? dogg: not much mack: i love 5 catt: i don’t joop: 27 rules! anna mack dogg/catt joop
Multi-ID Users letters numbers dogg: hey anna, Z rocks! anna: hey dogg whats new mack: what’s that dogg? dogg: not much dogg: Z is the best mack: i love 5 catt: i don’t joop: 27 rules! anna mack dogg/catt joop
Multi-ID Users letters numbers dogg: hey anna, Z rocks! anna: hey dogg whats new mack: what’s that dogg? dogg: not much dogg: Z is the best mack: i love 5 catt: i don’t joop: 27 rules! catt: i agree joop :) anna mack dogg/catt joop
Overview • A model of a public forum • Two efficient statistics-based algorithms for identification of multi-ID users • Simulation results
Overview • A model of a public forum • Two efficient statistics-based algorithms for identification of multi-ID users • Simulation results
A Model of a Public Forum • Every actor has one response queue • Common average and variance of response delay • The server has a queue to process messages with very short delay
Model Friendship Graph anna mack dogg joop catt
Model Alias Graph anna mack dogg joop catt
Multi-ID Users • One response queue for each actor dogg catt anna catt anna mack dogg/catt joop
Multi-ID Users letters numbers dogg: hey anna, Z rocks! dogg dogg catt anna catt anna mack dogg/catt joop
Multi-ID Users letters numbers dogg: hey anna, Z rocks! mack: i love 5 mack dogg dogg catt catt anna mack dogg/catt joop
Multi-ID Users letters numbers dogg: hey anna, Z rocks! anna: hey dogg whats new mack: i love 5 anna dogg dogg mack catt anna mack dogg/catt joop
Multi-ID Users letters numbers dogg: hey anna, Z rocks! anna: hey dogg whats new mack: what’s that dogg? mack: i love 5 mack anna dogg mack catt anna mack dogg/catt joop
mack anna Multi-ID Users letters numbers dogg: hey anna, Z rocks! anna: hey dogg whats new mack: what’s that dogg? mack: i love 5 catt: i don’t catt mack catt anna mack dogg/catt joop
Multi-ID Users letters numbers dogg: hey anna, Z rocks! anna: hey dogg whats new mack: what’s that dogg? dogg: not much mack: i love 5 catt: i don’t dogg mack catt anna catt anna mack dogg/catt joop
Multi-ID Users letters numbers dogg: hey anna, Z rocks! anna: hey dogg whats new mack: what’s that dogg? dogg: not much mack: i love 5 catt: i don’t joop: 27 rules! joop dogg catt mack catt anna mack dogg/catt joop
Multi-ID Users letters numbers dogg: hey anna, Z rocks! anna: hey dogg whats new mack: what’s that dogg? dogg: not much dogg: Z is the best mack: i love 5 catt: i don’t joop: 27 rules! dogg joop dogg catt mack anna mack dogg/catt joop
Multi-ID Users letters numbers dogg: hey anna, Z rocks! anna: hey dogg whats new mack: what’s that dogg? dogg: not much dogg: Z is the best mack: i love 5 catt: i don’t joop: 27 rules! catt: i agree joop :) catt dogg dogg catt joop anna mack dogg/catt joop
Overview • A model of a public forum • Two efficient statistics-based algorithms for identification of multi-ID users • Simulation results
First Algorithm • Collect information {‹timei, IDi›} • Compute minimum time delay minD for every pair of IDs • Cluster the delays using k-means into two groups. • Call pairs with larger center suspected to be the same actor (red) • Call pairs with smaller center suspected to different (blue) • Connected components using red edges are ID groups representing one actor
1. Collect information time dogg dogg dogg mack mack anna catt catt joop
2. minD(dogg, mack) dogg dogg dogg mack mack
2. minD(dogg, catt) dogg dogg dogg catt catt
3. Cluster minD values minD(dogg, catt) minD(dogg, mack) Different actors Same actor
Contradictions 3. Resulting Alias Graph anna mack dogg joop catt
4. One Red Component anna mack dogg joop 9 false positives 10% accuracy catt
Refinement Algorithm • Find groups connected with red edges • Color IDs into smaller ID groups using blue edges Follow original algorithm
4. One Red Component anna mack dogg joop catt
5. Color Blue Graph anna mack dogg joop catt
1 false positive 90% accuracy 5. Color Blue Graph anna mack dogg joop catt
Overview • A model of a public forum • Two efficient statistics-based algorithms for identification of multi-ID users • Simulation results
Future Work • Real-life testing • More sophisticated model • Different types of users • Short and overlapping online appearances • Length of posts • Algorithm improvements • Statistical evaluation of the new models • Lexical and semantic analysis