10 likes | 202 Views
2014 Yahoo! DAIS Research Excellence Award Competition. INFORMATION GAIN ANALYSIS (Market-1). INFORMATION GAIN ANALYSIS (Market-N). AUGMENTED FEATURIZER - CMFL (Market-1 ). AUGMENTED FEATURIZER - CMFL (Market-N ).
E N D
2014 Yahoo!DAIS Research Excellence Award Competition INFORMATION GAIN ANALYSIS (Market-1) INFORMATION GAIN ANALYSIS (Market-N) AUGMENTED FEATURIZER - CMFL (Market-1) AUGMENTED FEATURIZER - CMFL (Market-N) CANDIDATE GENERATION (JOIN ENTITY-URL PAIRS FROM ENTITY DB WITH QUERY-URL PAIRS FROM CLICK LOGS) ENTITY DATABASE AUGMENT TRAINING DATA - CMTDL (Market-1) AUGMENT TRAINING DATA - CMTDL (Market-N) FEATURIZER Community Distribution Outliers (PKDD 2013) Association-based Clique Outliers (ASONAM 2013) CLICK LOGS CLASSIFIER MODEL Market-1 CLASSIFIER MODEL Market-N CLASSIFIER MODEL Market-1 CLASSIFIER MODEL Market-N Tutorial at WWW 2014 MARKET-WISE TRAINING DATA CLASSIFIER OUTPUT: QE PAIRS (Market-1) CLASSIFIER OUTPUT: QE PAIRS (Market-N) CLASSIFIER OUTPUT: QE PAIRS (Market-1) CLASSIFIER OUTPUT: QE PAIRS (Market-N) Quick Achievements Summary Query-Based Subgraph Outliers (ICDE 2014) • 1 book on “Outlier Detection for Temporal Data” • PhD thesis on “Outlier Detection for Network Data” in May 2013 • Moved to India • Applied Researcher at Microsoft • Adjunct Faculty at IIIT-Hyderabad • Taught “Web mining” course in the first semester • 4 tutorials at SDM 2013, WWW 2014, CIKM 2013 and ASONAM 2013 • 12 research papers of which 8 are first author • Worked in these research areas • Outlier Detection • Microblog Analysis • Entity Mining CROSS MARKET OUTPUT DATA LEVERAGE - CMODL Tutorial at CIKM 2013 Outlier Detection Manish Gupta (gmanish@microsoft.com) • Motivated the idea of query-based outlier detection for heterogeneous information networks: ABCOutliers • Proposed a methodology to compute outlierness of a clique based on association outlierness of the properties of nodes within the clique • Experiments on several synthetic datasets and on Wikipedia entity network • Introduced outliers with respect to latent communities for heterogeneous networks • Proposed a joint-NMF optimization framework to learn distribution patterns across multiple object types • Proposed an iterative two stage approach for outlier detection • Proposed the problem of identifying query-based subgraph outliers based on deviations in linkage compared to the neighborhood • Discussed a methodology to compute the outlierness of a subgraph match based on a max-margin framework • A local method outperforms a partition-wide approach which in turn is more accurate than a global strategy • Given: Typed un-weighted query, a heterogeneous edge-weighted information network, Edge interestingness measure • Find top-K interesting subgraphs • Investigated ranking after matching baseline • Proposed three new graph indexes and exploited them for building a top-K solution • Combining event log info and OS performance metric observations is essential for effective anomaly detection. • Context patterns were derived by clustering in the space of context variables. • Metrics patterns were discovered using PCA-based similarity measure and a modified-KMeansalgo. Query-based Local Outliers (SDM 2014) Context-Aware Anomaly Detection (SDM 2013) Microblog Analysis • Entity Linking and Disambiguation • Problem: Associate entity name mentions in tweet text to the correct referent entities in Wikipedia • Linking Entities in #Microposts (WWW #Microposts Workshop 2014) • Approach • Identify mentions using POS Taggers • Identify referent entity using • Similarity between the mention and the corresponding Wikipedia entity pages • Similarity between the mention and the tweet text with the anchor text strings across multiple webpages • Popularity of the entity on Twitter at the time of disambiguation. • EDIUM: Improving Entity Disambiguation via User Modeling (ECIR 2014) • Entities from user’s previous tweets could help in creating interest models that could further help in disambiguating new entity mentions. • Entity Tracking in Real-Time using Sub-Topic Detection on Twitter (ECIR 2014) • Propose clustering techniques to discover sub-events • Keywords and Keyphrases in tweets • Concepts in URLs and tweet text Entity Mining Cross Market Modeling for Query-Entity Matching (WWW 2014) Problem: Given a query, the query-entity (QE) matching task involves identifying the best matching entity for the query in a particular market. Classifier with features: (1) Click features (2) Query-entity features (3) Segment distribution features (4) Query features etc. Challenges: (1) Sparse features in global markets (2) Labelled data cost for all markets Proposed Solution: (1) Cross Market Feature Leverage (CMFL): Share feature values (2) Cross Market Training Data Leverage (CMTDL): Share train data (3) Cross Market Output Data Leverage (CMODL): Share classifier outputs. Details: Designed smart rules for CMFL, CMODL, CMTDL For Images, it should be at least 150dpi, i.e. for 35cm width image, The width is about 2067 pixels System Diagram: Cross Market Analysis for QE Matching References [1] Manish Gupta, Jing Gao, Charu Aggarwal, Jiawei Han. Book: Outlier Detection for Temporal Data, Morgan & Claypool Publishers, 2014. [2] RomilBansal, Sandeep Panem, Priya Radhakrishnan, Manish Gupta, Vasudeva Varma. Linking Entities in #Microposts. #Microposts Workshop at The 23rd International World Wide Web Conference (WWW 2014), Seoul, Korea [3] Manish Gupta, Prashant Borole, Praful Hebbar, Rupesh Mehta, Niranjan Nayak. Cross Market Modeling for Query-Entity Matching. The 23rd International World Wide Web Conference (WWW 2014), Seoul, Korea [4] RomilBansal, Sandeep Panem, Manish Gupta, Vasudeva Varma. EDIUM: Improving Entity Disambiguation via User Modeling. The 36th European Conference on Information Retrieval (ECIR 2014), Amsterdam, Netherlands [5] Sandeep Panem, Romil Bansal, Manish Gupta, Vasudeva Varma. Entity Tracking in Real-Time using Sub-Topic Detection on Twitter. The 36th European Conference on Information Retrieval (ECIR 2014), Amsterdam, Netherlands [6] Manish Gupta, Arun Mallya, Subhro Roy, Jason H. D. Cho, Jiawei Han. Local Learning for Mining Outlier Subgraphs from Network Datasets. The 2014 SIAM International Conference on Data Mining (SDM 2014), Philadelphia, Pennsylvania [7] Manish Gupta, Rui Li, Kevin Chen-Chuan Chang. Tutorial: Towards a Social Media Analytics Platform: Event Detection and User Profiling for Microblogs. The 23rd International World Wide Web Conference (WWW 2014), Seoul, Korea. [8] Manish Gupta, Jing Gao, Charu Aggarwal, Jiawei Han. Outlier Detection for Temporal Data. IEEE Trans. on Knowledge and Data Engineering (TKDE), Jan 2014 [9] Manish Gupta, Jing Gao, Xifeng Yan, Hasan Cam, Jiawei Han. Top-K Interesting Subgraph Discovery in Information Networks. The 30th IEEE International Conference on Data Engineering (ICDE 2014), Chicago, IL [10] Manish Gupta, Jing Gao, Charu Aggarwal, Jiawei Han. Tutorial: Outlier Detection for Temporal Data. The ACM International Conference on Information and Knowledge Management (CIKM 2013), San Francisco, CA [11] Manish Gupta, Jing Gao, Charu Aggarwal, Jiawei Han. Tutorial: Outlier Detection for Graph Data. Proc. of 2013 IEEE/ACM Int. Conf. on Social Networks Analysis and Mining (ASONAM'13), Niagara Falls, Canada [12] Manish Gupta, Jing Gao, Jiawei Han. Community Distribution Outlier Detection in Heterogeneous Information Networks. European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD 2013), Prague, Czech [13] YizhouSun, Jie Tang, Jiawei Han, Cheng Chen, and Manish Gupta. Co-Evolution of Multi-Typed Objects in Dynamic Star Networks. IEEE Trans. on Knowledge and Data Engineering (TKDE), June 2013 [14] Manish Gupta, Jing Gao, Xifeng Yan, Hasan Cam, Jiawei Han. On Detecting Association-Based Clique Outliers in Heterogeneous Information Networks. Proc. of 2013 IEEE/ACM Int. Conf. on Social Networks Analysis and Mining (ASONAM'13), Niagara Falls, Canada [15] Manish Gupta. PhD thesis: Outlier Detection for Information Networks. [16] Manish Gupta, Abhishek Sharma, Haifeng Chen, Guofei Jiang. Context-Aware Time Series Anomaly Detection for Complex Systems. SDM 2013 Workshop on Data Mining for Service and Maintenance, Austin, TX [17] Manish Gupta, Jing Gao, Charu Aggarwal, Jiawei Han. Tutorial: Outlier Detection for Temporal Data. The 2013 SIAM International Conference on Data Mining (SDM 2013), Austin, TX