Parameter Related Domain Knowledge for Learning in Bayesian Networks

Parameter Related Domain Knowledge forLearning in Bayesian Networks Stefan Niculescu PhD Candidate, Carnegie Mellon University Joint work with professor Tom Mitchell and Dr. Bharat Rao April 2005

Domain Knowledge • In real world, often data is too sparse to allow building of an accurate model • Domain knowledge can help alleviate this problem • Several types of domain knowledge: • Relevance of variables (feature selection) • Conditional Independences among variables • Parameter Domain Knowledge

Parameter Domain Knowledge • In a Bayes Net for a real world domain: • can have huge number of parameters • not enough data to estimate them accurately • Parameter Domain Knowledge constraints: • reduce the number of parameters to estimate • reduce the variance of parameter estimates

Outline • Motivation • Parameter Related Domain Knowledge • Experiments • Related Work • Summary / Future Work

Parameters and Counts Theorem. The Maximum Likelihood estimators are given by: CPT for variable Xi

Parameter Sharing Theorem. The Maximum Likelihood estimators are given by:

Incomplete Data, Frequentist

Dependent Dirichlet Priors

Bayesian Averaging

Hierarchical Parameter Sharing

Probability Mass Sharing DK: Parameters of a given color have the same sum across all distributions. ...

Probability Ratio Sharing DK: Parameters of a given color preserve their relative ratios across all distributions. ...

Where are we right now?

Datasets • Project World - CALO • 6 persons, ~ 200 emails • Manually labeled as About / Not About Meetings • Data: (Person, Email, Topic) • Artificial Datasets • Kept most of the characteristics of the data BUT ... • ... new emails were generated where frequencies of certain words were shared across users • Purpose: • Domain Knowledge readily available • To be able to study the effect of training set size (up to 5000) • To be able to compare our estimated distribution to the true distribution

Approach • Can model Email using a Naive Bayes model: • Without Parameter Sharing (PSNB) • With Parameter Sharing (SSNB) • Also compare with a model that assumes the sender is irrelevant (GNB) • the frequencies of words within a topic to be learnt from all examples Sender Topic Word Sender Topic Word

Effect of Training Set Size • As expected: • SSNB performs better than both models • SSNB and PSNB tend to perform similarly when the size of training set increases, but SSNB much better when data is sparse

Dirichlet Priors in a Bayes Net Prior Belief Spread The Domain Expert specifies an assignment of parameters. However, leaves room for some error (Spread)

HMMs and DBNs ... ... ... ...

Module Networks • In a Module: • Same parents • Same CPTs Image from “Learning Module Networks” by Eran Segal and Daphne Koller

Context Specific Independence Burglary Set Alarm

Summary • Parameter Related Domain Knowledge is needed when data is scarce • Developed methods to estimate parameters: • For each of four types of Domain Knowledge presented • From both complete and incomplete Data • Markov Models, Module Nets, Context Specific Independence – particular cases of our parameter sharing domain knowledge • Models using Parameter Sharing performed better than two classical Bayes Nets on synthetic data

Future Work • Automatically find Shared Parameters • Study interactions among different types of Domain Knowledge • Incorporate Domain Knowledge about continuous variables • Investigate Domain Knowledge in the form of inequality constraints

Questions ?

THE END

Backup Slides

Hierarchical Parameter Sharing

Full Data Observability, Frequentist

Probability Mass Sharing • Want to model P(Word|Language) • Two languages: English, Spanish • Different sets of words • Domain Knowledge: • Aggregate Probability Mass of Nouns the same in both • Same holds for adjectives, verbs, etc

Probability Mass Sharing

Probability Ratio Sharing • Want to model P(Word|Language) • Two languages: English, Spanish • Different sets of words • Domain Knowledge: • Word groups: • About computers: computer, mouse, monitor, etc • Relative frequency of “computer” to “mouse” same in both languages • Aggregate mass can be different T1Computer Words T2 Business Words

Probability Ratio Sharing

Parameter Related Domain Knowledge for Learning in Bayesian Networks

Parameter Related Domain Knowledge for Learning in Bayesian Networks

Presentation Transcript

Knowledge Engineering for Bayesian Networks

Knowledge Engineering for Bayesian Networks

Knowledge Engineering for Bayesian Networks

Knowledge Engineering for Bayesian Networks

Learning In Bayesian Networks

V9: Parameter Estimation for Bayesian Networks

Bayesian Learning and Learning Bayesian Networks

Learning with Bayesian Networks

Knowledge Engineering for Bayesian Networks

Learning With Bayesian Networks

Advances in Bayesian Learning Learning and Inference in Bayesian Networks

Exploiting Parameter Domain Knowledge for Learning in Bayesian Networks

Learning in Bayesian Networks

Learning Bayesian Networks for Cellular Networks

Learning with Bayesian Networks

Learning Bayesian Networks

Knowledge Engineering for Bayesian Networks

Learning Bayesian Networks

Learning Bayesian Networks for Cellular Networks

Learning Bayesian Networks

Structure Learning in Bayesian Networks

Knowledge Engineering for Bayesian Networks