320 likes | 492 Views
COMP 538 Introduction of Bayesian networks. Lecture 15: Hierarchical Latent Class Models Based ON N. L. Zhang (2002). Hierarchical latent class models for cluster analysis. Journal of Machine Learning Research , to appear. Outline. Motivation Application of LCA in medicine
E N D
COMP 538 Introduction of Bayesian networks Lecture 15: Hierarchical Latent Class Models Based ON N. L. Zhang (2002). Hierarchical latent class models for cluster analysis. Journal of Machine Learning Research, to appear.
Outline • Motivation • Application of LCA in medicine • Model-based clustering and TCM diagnosis • Need of more general models • Theoretical Issues • Learning Algorithm • Empirical Results • Related work
Motivations/LCA in Medicine • In medical diagnosis, sometimes gold standard exists • Example: Lung Cancer • Symptoms: • Persistent cough, Hemoptysis (Coughing up blood), Constant chest pain, Shortness of breath, Fatigue, etc • Information for diagnosis: symptoms, medical history, smoking history, X-ray, sputum. • Gold standard: • Biopsy: the removal of a small sample of tissue for examination under a microscope by a pathologist
Motivations/LCA in Medicine • Sometimes gold standard does not exist • Example: Rheumatoid Arthritis (RA) • Symptoms: Back Pain, Neck Pain, Joint Pain, Joint Swelling, Morning Joint Stiffness, etc • Information for diagnosis: • Symptoms, medical history, physical exam, • Lab tests including a test for rheumatoid factor. (Rheumatoid factor is an antibody found in the blood of about 80 percent of adults with RA. ) • No gold standard: • None of the symptoms or their combinations are not clear-cut indicators of RA • The presence or absence of rheumatoid factor does not indicate that one has RA.
Motivations/LCA in Medicine • Questions: • How many diagnostic categories there should be? • What rules to use when making diagnosis? • Note: These questions cannot be answered using regression (supervised learning) because • The true “disease type” is never directly observed. It is latent. • Ideas: • Each “disease type” must correspond to a cluster of people. • People in different clusters demonstrate different symptom patterns (otherwise diagnosis is hopeless) • Possible solution: Perform cluster analysis of symptom data to reveal patterns.
X Yp Y1 Y2 Motivations/LCA in Medicine • Latent class analysis (LCA) • Cluster analysis based on the latent class (LC) model • Observed variables Y_j: symptoms • Latent variable X: “disease type” • Assumption: • Y_j’s independent of each other given X • Given: Data on Y_j • Determine: • Number of states for X • Prevalence: P(X) • Class specific probability P(Y_j|X)
Motivations/LCA in Medicine • LC Analysis of Hannover Rheumatoid Arthritis Data Class specific probabilities • Cluster 1: “disease” free • Cluster 2: “back-pain type” • Cluster 3: “Joint type” • Cluster 4: “Severe type”
Model-Based Clustering and TCM diagnosis • Diagnosis in traditional Chinese Medicine (TCM) • Example: deficiency of kidney(肾虚), • Symptoms: lassitude in the loins (腰酸软而痛), tinnitus(耳鸣), driping urine (小便余沥不尽), etc • Similar to Rheumatoid Arthritis • Diagnosis based on symptoms • No gold standards exist
Model-Based Clustering and TCM diagnosis • Current status • Researcher have been searching for laboratory indices that can serve as gold standards. • All such effort failed. • In practice, quite subjective. Differ considerably between doctors. • Hindering practices and preventing international recognition.
Model-Based Clustering and TCM diagnosis • How to lay TCM diagnosis on a scientific foundation? • Model-based cluster analysis • Statistical methods might be the answer: • TCM diagnosis based on experiences (by contemporary practitioners and ancient doctors) • Experiences are summaries of patient cases. • Summarizing patient cases by humans braining leads to subjectivity. • Summarizing patient cases by computer avoids subjectivity.
Need of more general Models • Preliminary analysis of TCM data using LCA: • Could not find models that fit data well • Reason: latent class (LC) models are too simplistic • Local independence: Observed variables mutually independent given the latent variable • Need: more realistic models
Need of more general Models • Hierarchical latent class (HLC) models: • Tree structured Bayesian networks, where • Leaf nodes are observed and others are not • Manifest variables = observed variables • Maybe still too simplistic, but a good first step • More general than LC models • Nice computational properties • Task: • Learn HLC models from data • Learn latent structures from what we can observe.
Theoretical Issues • What latent structures can be learned from data? • An HLC model M is parsimonious if there does NOT exist another model M' that • Is marginally equivalent to M, and P(manifest vars|M) = P(manifest vars|M’) • Has fewer independent parameters than M. • Occam’s razor prefers parsimonious models over non-parsimonious ones
Theoretical Issues • Regular HLC models • HLC model is regular if for any latent node Z with neighbors X1, X2, …, Xk where strict inequality hold when there are only two neighbors • Irregular models are not parsimonious. (Operational characterization of parsimony) • The set of all possible regular HLC models for a given set of manifest variables is finite. (Finite search space for learning algorithm.)
Theoretical Issues • Model Equivalence • Root walking • M1: root walks to X2 • M2: root walks to X3 • Root walking leads to equivalent models
Theoretical Issues • Unrooted HLC models • The root of an HLC model can walk to any latent node. • Unrooted model: HLC models with undirected edges. • We can only learn unrooted models. • Question: which latent node should be the class node? • Answer: Any, depending on semantics and purpose of clustering. Learn one model for multiple clustering.
Theoretical Issues • Measure of model complexity • When no latent variables: number of free parameters (standard dimension) • When latent variables: effective dimension instead • P(Y1, Y2, …, Yn) spans 2n –1 dimensional space S if no constraints. • HLC model imposes some constraints on the joint • It spans a subspace of S • Effective dimension of model: dimension of S. HARD to compute.
Theoretical Issues • Reduction Theorem for regular HLC models (Kocka and Zhang 2002): • D(M) = D(M1) + D(M2) – number of common parameters • Problem reduces to: effective dimension of LC models. Good approximation exists.
Theoretical Issues • Example • Standard dimension: 110 • Effective dimension: 61
Learning HLC Models • Given: i.i.d. samples generated by some regular HLC model. • Task: Reconstruct the HLC model from data. • Hill-climbing algorithm • Scoring metric: We experiment with AIC,BIC, CS, Holdout LS (yet to run experiments with effective dimension) • Search space: • Set of all possible regular HLC models for the given manifest variables. • We structure the space into two levels according to two subtasks • Given a model structure, estimate cardinalities of latent variables. • Find a optimal model structure.
Learning HLC Models • Estimate cardinalities of latent variables given model structure • Search space: All regular models with the given model structure. • Hill-climbing: • Start: All latent variables have minimum cardinality (usually 2) • Search operator: Increate the cardinality of one latent variable by one
Learning HLC Models • Find optimal model structures • Search space: Set of all regular unrooted HLC model structures for the given manifest variables. • Hill-Climbing: • Start: unrooted LC model structure • Search operators: • Node introduction, Node elimination, Neighbor relocation • Can go between any two model structures using those operators.
Learning HLC Models • Motivations for search Operators: • Node introduction: M1’ M2’. Deal with local dependence. • Opposite: Node elimination. • Neighbor relocation: M2’ M3’. Result of tradeoff. • Opposite. Itself. • Not allowed to yield irregular model structures.
Empirical Results • Synthetic data: • Generative model, randomly parameterized • All latent variables have 3 states. • Sample sizes: 5k, 10k, 50k, 100k • Log scores on testing data • Close to that of generative model • Do not vary much across scoring metrics.
Empirical Results • Learned structures: Numbers of steps to true structure
Empirical Results • Cardinality of Latent variables • Better results with more skewed parameters
Empirical Results • Hannover Rheumatoid Arthritis data: • 5 binary manifest variables: back pain, neck pain, joint swelling, … • 7,162 records • Analysis by Kohlmann and Formann (1997): 4 class LC model. • Our algorithm: exactly the same model. • Coleman data • 4 binary manifest variables, 3,398 records. • Analysis by Goodman (1974) and Hagenaars (1988): M1, M2 • Our algorithm: M3
Empirical Results • HIV data • 4 binary manifest variables, 428 records • Analysis by Uebesax (2000): • Our algorithm: • House Building data • 4 binary manifest variable, 1185 Records • Analysis by Hagenaars (1988): M2, M3, M4 • Our algorithm: 4 class LC model, fits data poorly. A failure. Reason: limitation of HLC models
AAGGCCT AAGACTT AGCACTT AGCACAA AGGGCAT TAGACTT AGCGCTT TAGCCCA Related Work • Phylogenetic trees: • Represent relationship between a set of species. • Probabilistic model: • Taxa aligned, sites evolves i.i.d • Conditional probs: character evolution model. Parameters: edge lengths, representing time. • Restricted to one site, a phylogenetic tree is a HLC model where • Binary tree structure, same state space for all vars. • The conditional probabilities are parameterized by edge lengths • The model is the same for different sites
Related Work • Tree reconstruction: • Given: current taxa. Find: tree topology and edge lengths. • Methods • Hill-climbing • Stepwise addition of sites • Star decomposition, similar to node introduction in HLC models. • Branch swapping, similar to neighbor relocation in HLC models • Structural EM (Friedman et al 2002): • Use fact: All vars have same state space • Neighbor joining (Saitou & Nei, 1987): • Use facts: parameters = edge lengths, additivity.
Related Work • Connolly (1993): • Heuristic method for constructing HLC models • Mutual information used to group variables • One latent variable introduced for each group. • Cardinalities of latent variables determined using conceptual clustering • Martin and VanLehn (1994): • Heuristic method for learning two-level Bayesian network where the top level is latent. • Elidan et al. (2001): • Learning latent variables for general Bayesian networks. • Aim: Simplification. Idea: Structural signature. • Model-based hierarchical clustering (Hansen et al. 1991): • Hierarchical the state space for ONE cluster variable.
Related Work • Diagnostics for local dependence in LC models: • Hagenaars (1988): • Standardized residual • Espeland & Handelmann (1988) • Likelihood ratio statistic • Garret & Zeger (2000) • Log odds ratio • Modeling local dependence in LC models • Joint variable (M2), multiple indicator (M3), loglinear model (M4)