360 likes | 495 Views
Parameter Related Domain Knowledge for Learning in Bayesian Networks. Stefan Niculescu PhD Candidate, Carnegie Mellon University Joint work with professor Tom Mitchell and Dr. Bharat Rao April 2005. Domain Knowledge.
E N D
Parameter Related Domain Knowledge forLearning in Bayesian Networks Stefan Niculescu PhD Candidate, Carnegie Mellon University Joint work with professor Tom Mitchell and Dr. Bharat Rao April 2005
Domain Knowledge • In real world, often data is too sparse to allow building of an accurate model • Domain knowledge can help alleviate this problem • Several types of domain knowledge: • Relevance of variables (feature selection) • Conditional Independences among variables • Parameter Domain Knowledge
Parameter Domain Knowledge • In a Bayes Net for a real world domain: • can have huge number of parameters • not enough data to estimate them accurately • Parameter Domain Knowledge constraints: • reduce the number of parameters to estimate • reduce the variance of parameter estimates
Outline • Motivation • Parameter Related Domain Knowledge • Experiments • Related Work • Summary / Future Work
Parameters and Counts Theorem. The Maximum Likelihood estimators are given by: CPT for variable Xi
Parameter Sharing Theorem. The Maximum Likelihood estimators are given by:
Probability Mass Sharing DK: Parameters of a given color have the same sum across all distributions. ...
Probability Ratio Sharing DK: Parameters of a given color preserve their relative ratios across all distributions. ...
Outline • Motivation • Parameter Related Domain Knowledge • Experiments • Related Work • Summary / Future Work
Datasets • Project World - CALO • 6 persons, ~ 200 emails • Manually labeled as About / Not About Meetings • Data: (Person, Email, Topic) • Artificial Datasets • Kept most of the characteristics of the data BUT ... • ... new emails were generated where frequencies of certain words were shared across users • Purpose: • Domain Knowledge readily available • To be able to study the effect of training set size (up to 5000) • To be able to compare our estimated distribution to the true distribution
Approach • Can model Email using a Naive Bayes model: • Without Parameter Sharing (PSNB) • With Parameter Sharing (SSNB) • Also compare with a model that assumes the sender is irrelevant (GNB) • the frequencies of words within a topic to be learnt from all examples Sender Topic Word Sender Topic Word
Effect of Training Set Size • As expected: • SSNB performs better than both models • SSNB and PSNB tend to perform similarly when the size of training set increases, but SSNB much better when data is sparse
Outline • Motivation • Parameter Related Domain Knowledge • Experiments • Related Work • Summary / Future Work
Dirichlet Priors in a Bayes Net Prior Belief Spread The Domain Expert specifies an assignment of parameters. However, leaves room for some error (Spread)
HMMs and DBNs ... ... ... ...
Module Networks • In a Module: • Same parents • Same CPTs Image from “Learning Module Networks” by Eran Segal and Daphne Koller
Context Specific Independence Burglary Set Alarm
Outline • Motivation • Parameter Related Domain Knowledge • Experiments • Related Work • Summary / Future Work
Summary • Parameter Related Domain Knowledge is needed when data is scarce • Developed methods to estimate parameters: • For each of four types of Domain Knowledge presented • From both complete and incomplete Data • Markov Models, Module Nets, Context Specific Independence – particular cases of our parameter sharing domain knowledge • Models using Parameter Sharing performed better than two classical Bayes Nets on synthetic data
Future Work • Automatically find Shared Parameters • Study interactions among different types of Domain Knowledge • Incorporate Domain Knowledge about continuous variables • Investigate Domain Knowledge in the form of inequality constraints
Probability Mass Sharing • Want to model P(Word|Language) • Two languages: English, Spanish • Different sets of words • Domain Knowledge: • Aggregate Probability Mass of Nouns the same in both • Same holds for adjectives, verbs, etc
Probability Ratio Sharing • Want to model P(Word|Language) • Two languages: English, Spanish • Different sets of words • Domain Knowledge: • Word groups: • About computers: computer, mouse, monitor, etc • Relative frequency of “computer” to “mouse” same in both languages • Aggregate mass can be different T1Computer Words T2 Business Words