1 / 27

On Community Outliers and their Efficient Detection in Information Networks

On Community Outliers and their Efficient Detection in Information Networks. Jing Gao , Feng Liang, Wei Fan, Chi Wang, Yizhou Sun, and Jiawei Han SIGKDD, 2010 Presented by Hung-Yi Cai 2011/2/10. Outlines. Motivation Objectives Related Work Methodology Experiments Conclusions

clio
Download Presentation

On Community Outliers and their Efficient Detection in Information Networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. On Community Outliers and their Efficient Detection in Information Networks Jing Gao, Feng Liang, Wei Fan, Chi Wang, Yizhou Sun, and Jiawei Han SIGKDD, 2010 Presented by Hung-Yi Cai 2011/2/10

  2. Outlines • Motivation • Objectives • Related Work • Methodology • Experiments • Conclusions • Comments

  3. Motivation • Linked or networked data are ubiquitous in many applications. • Outlier detection in information networks can reveal important anomalous and interesting behaviors that are not obvious if community information is ignored.

  4. Objectives To propose an efficient solution by modeling networked data as a mixture model composed of multiple normal communities and a set of randomly generated outliers. The probabilistic model characterizes both data and linkssimultaneously by defining their joint distribution based on hidden Markov random fields (HMRF).

  5. Related Work

  6. Methodology

  7. Methodology • Community Outlier • Outlier Detection via HMRF • Modeling Continuous and Text Data • Fitting Community Outlier Detection Model • Inference • Parameter Estimation

  8. Community Outlier Outlier Detection via HMRF:

  9. Community Outlier • Modeling Continuous and Text Data: • Continuous data • Text data

  10. Fitting Community Outlier Detection Model • Inference: • We first assume that the model parameters in Θ are known, and discuss how to obtain an assignment of the hidden variables. • The objective is to find the configuration that maximizes the posterior distribution given Θ.

  11. Fitting Community Outlier Detection Model In general, we seek a labeling of the objects, Z = {z1,...,zM}, to maximize the posterior probability (MAP): Using the Iterated Conditional Modes (ICM) algorithm to solve this MAP estimation problem. It adopts a greedy strategy by calculating local minimization iteratively and the convergence is guaranteed after a few iterations.

  12. Fitting Community Outlier Detection Model Normal community…

  13. Fitting Community Outlier Detection Model Outlier…

  14. Fitting Community Outlier Detection Model

  15. Fitting Community Outlier Detection Model • Parameter Estimation: • In this part, we consider the problem of estimating unknown Θ from the data. • We view it as an “incomplete-data” problem, and use the expectation-maximization (EM) algorithm to solve it.

  16. Fitting Community Outlier Detection Model The outlier component…

  17. Fitting Community Outlier Detection Model The normal component…

  18. Fitting Community Outlier Detection Model

  19. Experiments To conduct experiments on synthetic data to compare detection accuracy with the baseline methods, and evaluate on real datasets to validate that the proposed algorithm can detect community outliers effectively.

  20. Experiments - Synthetic Data Baseline Method:

  21. Experiments - Synthetic Data Empirical Results:

  22. Experiments - Synthetic Data Sensitivity and Time Complexity:

  23. Experiments - DBLP Sub-network of Conferences:

  24. Experiments - DBLP • Sub-network of Conferences: • The community outliers detected by the proposed algorithm include CVPR and CIKM.

  25. Experiments - DBLP • Sub-network of Authors: • Community Outliers in DBLP co-authors.

  26. Conclusions • In this paper, discussing a new outlier detection problem in networks containing rich information, including data about each object and relationships among objects. • Proposing a generative model called CODA that unifies both community discovery and outlier detection in a probabilistic formulation based on hidden Markov random fields.

  27. Comments • Advantages • The CODA algorithm outperform the baseline methods on synthetic and DBLP data. • Applications • Outlier Detection • Community Discovery

More Related