350 likes | 675 Views
Unifying Learning to Rank and Domain Adaptation -- Enabling Cross-Task Document Scoring. Mianwei Zhou, Kevin Chen-Chuan Chang University of Illinois at Urbana-Champaign. Document Scorer Learning with Queries/Domains. Information Retrieval. Sentiment Analysis. Spam Detection.
E N D
Unifying Learning to Rank and Domain Adaptation-- Enabling Cross-Task Document Scoring Mianwei Zhou, Kevin Chen-Chuan Chang University of Illinois at Urbana-Champaign
Document Scorer Learning with Queries/Domains Information Retrieval Sentiment Analysis Spam Detection Handle Various User Needs Queries Tackle Diverse Sources of Data Domains
Document Scorer Learning for Different Queries:Learning to Rank Example Application: Entity-Centric Filtering (TREC-KBA 2012) Difficulty: Long Wikipedia pages as queries; Noise keywords in queries. Wiki Page Relevant Irrelevant Training Phase Wiki Page Relevant Irrelevant Testing Phase ? ? ? ? Wiki Page
Document Scorer Learning for Different Domains:Domain Adaptation Example Application: Cross-Domain Sentiment Analysis (Blitzer 2006) Difficulty: Different domains use different sentiment keywords. Book Reviews I do not like this book, it is very boring ... It is a very interesting book … Training Phase Kitchen Appliance Reviews This coffee maker has high quality! Do not buy this juice extractor, it is leaking ... Testing Phase
Learning to Rank VSDomain Adaptation Testing Phase Training Phase Training Phase Testing Phase B B B B Interesting Leaking Microsoft Basketball Bill Gates Ugly Boring Chicago Bull training talk book kitchen Common Challenge Different keyword importance across training and testing phases.
Problem: Cross-Task Document Scoring
Cross-Task Document Scoring Unify Learning to Rank and Domain Adaptation Testing Phase Training Phase Training Phase Testing Phase B B B B Interesting Leaking Microsoft Basketball Bill Gates Ugly Boring Chicago Bull training talk kitchen book Query (learning to rank) 1. Task Domain (domain adaptation) 2. Cross-Task: Training and testing phases tackle different tasks.
Challenge: Document Scoring across Different Tasks
Document Scoring Principle Relevant or Not Query Document
Document Scoring Principle The relevance of a document depends on how it contains keywords that are important for the query. Q1: Which keywords are important for the query? Q2: How the keywords are contained in the doc? B Microsoft talk Tuesday Bill Gates Query Keywords Document
Requirement of Traditional Learning to Rank:Manually Fulfill the Principle In Feature Design Learning to Rank Models RankSVM LambdaMART RankBoost Abstraction Input Feature Output Score Document Relevance BM25, Language Model, Vector Space, …
Difficult to Manually Fulfill the Principle for Noisy Query and Complex Document BM25, Language Model, ... Q1: Which are important? Q2: How are contained? Ans: TF Ans: High-IDF Insufficient B Microsoft talk Tuesday Bill Gates
Limitation of Traditional Learning to Rank:Leave Heavy Burden to Feature Designers Learning to Rank Models Leave the Burden to Feature Designers RankSVM LambdaMART RankBoost
Proposal: Feature Decoupling
Feature Decoupling-- Towards Facilitating the Feature Design Document Scoring Principle How the document contains keywords that are important for the query? Traditional Feature Meta-Features Intra-Features Q1: Which keywords are important? Q2: How the keywords are contained?
Feature Decoupling for Entity Centric Document Filtering Meta-Feature Intra-Feature General: IDF, IsNoun, InEntity, ... Different Position: TFInURL, TFInTitle, ... Structural: PositionInPage, InInfobox, InOpenPara, ... Different Representation: LogTF, NormalizedTF, ... B Microsoft talk Tuesday Bill Gates
Feature Decoupling for Cross-Domain Sentiment Analysis Meta-Feature Intra-Feature Different Position: TFInURL, TFInTitle, ... Correlation with Pivot Keywords: Corr[ Different Representation: LogTF, NormalizedTF, ... Good Interesting Bad Boring Tedious High Quality Pivot Keywords Leaking Broken
To Learn Ranker given Decoupled Features, the model should1. “Recouple” Features; Document Scoring Principle The relevance of a document depends on how it contains keywords that are important for the query? Document Relevance Keyword Contribution
To Learn Ranker given Decoupled Features, the model should1. “Recouple” Features; Document Scoring Principle The relevance of a document depends on how it contains keywords that are important for the query? Recoupling Intra-Feature Meta-Feature
To Learn Ranker given Decoupled Features, the model should1. “Recouple” Features;2. Be Noise Aware. Document Scoring Principle The relevance of a document depends on how it contains keywords that are important for the query? Noise-Aware list Mexico Jeff Query Noisy Keywords
Requirement for Noise-Aware Recoupling: Inferred Sparsity For Noisy Keywords = 0 list Mexico Jeff Query Noisy Keywords Document
To Achieve Inferred Sparsity:Two-Layer Scoring Model Keyword Classifier Inferred Sparsityfor noisy keywords Contribution for important Keywords
Realize such a Two-Stage Scoring Model is Non-Trivial Keyword Classifier Inferred Sparsityfor noisy keywords Contribution for important Keywords How should depend on Meta-Features, and depend on intra-features? How to learn without keyword labels?
Solution: Tree-Structured Restricted Boltzmann Machine
Overview of Tree-Structured Restricted Boltzmann Machine (T-RBM) : Bill Gates, speaking as co-founder of Microsoft, will give a talk today. : The relevance of document d : Hidden Variable. The importance of keyword and : Graph Factors … Gates : founder : Microsoft
Use and to Define : Bill Gates, speaking as co-founder of Microsoft, will give a talk today. … Gates : founder : Microsoft
Use and to Define : Bill Gates, speaking as co-founder of Microsoft, will give a talk today. … Gates : founder : Microsoft
Use to Incorporate Traditional Learning to Rank Features : Bill Gates, speaking as co-founder of Microsoft, will give a talk today. : Page Rank, Doc Length, ... … Gates : founder : Microsoft
Learning Feature Weighting by Likelihood Maximization Maximize : Bill Gates, speaking as co-founder of Microsoft, will give a talk today. … Compute the Gradient by Belief Propagation Gates : founder : Microsoft
Datasets for Two Different Applications • Entity-Centric Document Filtering • Dataset: TREC-KBA • 29 person entities, 52,238 documents • Wikipedia pages as ID pages • Cross-Domain Sentiment Analysis • Use Dataset released by Blitzer et al. • 4 domains, 8000 documents.
T-RBM Outperforms Other Baselines on Both Applications Traditional learning to rank/classify frameworks without Feature Decoupling. Use a simple linear weighting model to combine meta-features and intra-features. Use a boosting framework to combine meta-features and intra-features. Structural Correspondence Learning (SCL), the domain adaptation model proposed by Blitzer et al.
Summary • Propose to solve learning to rank and domain adaptation as a unified cross-task document scoring problem. • Propose the idea of feature decoupling to facilitate feature design. • Propose a noise-awareT-RBM model to “recouple the features”.
Thanks! Q&A