290 likes | 446 Views
SybilSCAR : Sybil Detection in Online Social Networks via Local Rule based Propagation. Binghui Wang , Le Zhang, Neil Zhenqiang Gong Department of Electrical and Computer Engineering. OUTLINE. Background Algorithm Evaluation Conclusion. OUTLINE. Background Algorithm Evaluation
E N D
SybilSCAR: Sybil Detection in Online Social Networks via Local Rule based Propagation Binghui Wang, Le Zhang, Neil Zhenqiang Gong Department of Electrical and Computer Engineering
OUTLINE • Background • Algorithm • Evaluation • Conclusion
OUTLINE • Background • Algorithm • Evaluation • Conclusion
Online Social Networks (OSNs) are Popular 1.86 billion monthly active users 300 million monthly active users 500 million tweets per day
Threats of Sybil Attacks • Sybils can be used to perform variousmalicious activities • Distribute spams and phishing attacks • Harvest private user data • Influence financial market • Disrupt presidential election ?? • … • Sybil detection is an urgent research problem
Existing Sybil Detection Methods • Various methods by multiple research communities • networking, security, data mining, etc. • Feature-based methods: Feature extraction + ML classifier • Feature: side information (e.g., content, behavior), local structure (e.g., clustering coefficient, common neighbor), etc. • Classifier: support vector machine, logistic regression, etc. • Fundamental limitation: not adversarially robust • Structure-based methods: Leverage the global structure of OSNs • Leverage edge between nodes to propagate graph information • More adversarially robust
Pros and Cons of Structure-based Methods • Random Walk (RW)-based methods • Pros • Efficient/Scalable • Guarantee to converge • Cons • Use either labeled benign nodes or labeled Sybils, but not both • Not robust to label noise • Loopy Belief Propagation (LBP)-based methods • Pros • Leverage both labeled benign nodes and labeled Sybils • Robust to label noise • Cons • Not scalable • Not guaranteed to converge
Our Contribution: SybilSCAR • A novel structure-based Sybil detection method • Maintain advantages and address limitations of existing methods • Compared with RW, it leverages both labeled benign nodes and labeled Sybils, and is robust to label noise • Compared with LBP, it is scalable and convergent • In a nutshell, scalable, convergent, accurate, robust to label noise
OUTLINE • Background • Algorithm • Evaluation • Conclusion
Problem Definition • Input • Social GraphG=(V, E) • Training set • Labeled Sybils • Labeled benign nodes • Output • Label of each remaining node
Our General Local Rule-based Framework • Learn prior knowledge quusing training set • Reputation score, e.g., RW-based methods • Probability of being Sybil, e.g., LBP-based methods • E.g., u is labeled Sybil, qu=0.9; labeled benign, qu=0.1; unlabeled, qu=0.5 • Propagate the prior knowledge among social graph to get posterior knowledge pu • Iteratively apply a local rule to every node • Rank posterior knowledgeto detect Sybils • Sybils have larger values than benign nodes
What is Local Rule • Local Rule: Neighbor influences + prior knowledge => posterior knowledge v fvu Different methods use different neighbor influences fsu pu u s Different methods have different combinations of neighbor influence with prior knowledge qu ftu t
Existing Methods are Special Cases • RW-based methods: Additive local rule • LBP-based methods: Multiplicative local rule wuv: weight of the edge (u,v)
Our Local Rule: Neighbor Influence • Homophily strength wvu:probability that u and v have the same label • Neighbor influence fvu: the probability that u is a Sybil, given the information about its neighbor v and the homophily strength wvu
Our Local Rule: Combine Neighbor Influence with Prior Knowledge Multiplicatively • Do not store neighbor influence => Efficient • Leverage both labeled benign nodes and labeled Sybils => Accurate • Multiplicative combination => Robust to label noise • However, not guaranteed to converge
Linearization to Guarantee Converge • Use the approximation • Use residual vector • Nonlinear multiplication reduces to linear residual addition
Our Final Local Rule in Residual Form Set equal homophily strength wvu=w for all edges
OUTLINE • Background • Algorithm • Evaluation • Conclusion
Experimental Setups • Datasets • Facebook with synthesized Sybils • Small Twitter with real Sybils • Large Twitter with real Sybils • Compared methods • State-of-the-art RW-based method: SybilRank • State-of-the-art LBP-based method: SybilBelief
Ranking Accuracy SybilSCAR is slightly better than SybilBelief SybilSCAR and SybilBelief are more accurate than SybilRank
Robustness to Label Noise SybilSCAR and SybilBelief almost have the same robustness against label noise SybilSCAR and SybilBelief are much more robust to label noise than SybilRank
Scalability SybilSCAR is as scalable as SybilRank SybilSCAR is more scalable than SybilBelief
Convergence SybilSCAR and SybilRank converge SybilBelief cannot converge
OUTLINE • Background • Algorithm • Evaluation • Conclusion
Conclusion • A general local rule-based framework to unify existing Sybil detection methods • Our novel local rule integrates advantages of existing methods, while overcoming their limitations • Future work • Design local rules to detect other types of Sybils, e.g., web spams, fake reviews, and fake likes • Compare different local rules theoretically • Learn homophily strength for each edge
Thanks & Questions Binghui Wang Email: binghuiw@iastate.edu Le Zhang Email: lezhang@iastate.edu Neil Zhenqiang Gong Email: neilgong@iastate.edu
Convergence Analysis • Lemma 1: Given a linear system , it convergences with any initial choice yiff the spectral radius • Theorem 2 (Necessary and sufficient condition): SybilSCAR converges iff • Difficult to achieve, seek for sufficient convergence condition! • Theorem 3 (Sufficient condition):SybilSCAR guarantees to converge if
Complexity Analysis • Space complexity • Time complexity • SybilSCAR has the same asymptotic space and time complexity with RW-based SybilRank and LBP-based SybilBelief • However, it is more space efficient and time efficient than SybilBelief, as it does not store neighbor influence of every edge