70 likes | 293 Views
Yahoo! DAIS Research Excellence Award Competition Manish Gupta. Quick Achievements Summary. 1 book on “Outlier Detection for Temporal Data” PhD thesis on “Outlier Detection for Network Data” in May 2013 Moved to India Applied Researcher at Microsoft Adjunct Faculty at IIIT-Hyderabad
E N D
Yahoo! DAIS Research Excellence Award CompetitionManish Gupta
Quick Achievements Summary • 1 book on “Outlier Detection for Temporal Data” • PhD thesis on “Outlier Detection for Network Data” in May 2013 • Moved to India • Applied Researcher at Microsoft • Adjunct Faculty at IIIT-Hyderabad • Taught “Web mining” course in first semester • 4 tutorials • 12 research papers of which 8 are first author • Worked in 4 research areas • Outlier Detection • Microblog Analysis • Entity Mining • Community Analysis
Outlier Detection Community Distribution Outliers (PKDD 2013) Association-based Clique Outliers (ASONAM 2013) Query-based Local Outliers (SDM 2014) Query-Based Subgraph Outliers (ICDE 2014) Context-Aware Anomaly Detection (SDM 2013)
Microblog Analysis • Entity Tracking in Real-Time using Sub-Topic Detectionon Twitter (ECIR 2014) • Propose clustering techniques to discover sub-events • Keywords and Keyphrases in tweets • Concepts in URLs and tweet text • Linking Entities in #Microposts (WWW workshop 2014) • Associating entity name mentions in tweet text to the correct referent entities in Wikipedia • Identify mentions using POS Taggers • Identify referent entity using • Similarity between the mention and the corresponding Wikipedia entity pages • Similarity between the mention and the tweet text with the anchor text strings across multiple webpages • Popularity of the entity on Twitter at the time of disambiguation. Tutorial at WWW 2014 • EDIUM: Improving Entity Disambiguation via User Modeling (ECIR 2014) • Entity Disambiguation is the task of finding correct Entity Referent in the knowledge base for the given mention. • Entities from user’s previous tweets could help in creating interest models that could further help in disambiguating new entity mentions.
Entity Mining (WWW 2014) • Cross Market Modeling for Query-Entity Matching • Problem: Given a query, the query-entity (QE) matching task involves identifying the best matching entity for the query in a particular market. • Classifier with features: (1) Click features (2) Query-entity features (3) Segment distribution features (4) Query features etc. • Challenges: (1) Sparse features in global markets (2) Labelled data cost for all markets • Proposed Solution: (1) Cross Market Feature Leverage (CMFL): Share feature values (2) Cross Market Training Data Leverage (CMTDL): Share train data (3) Cross Market Output Data Leverage (CMODL): Share classifier outputs.