1 / 32

Lecture 15: Hierarchical Latent Class Models Based ON

COMP 538 Introduction of Bayesian networks. Lecture 15: Hierarchical Latent Class Models Based ON N. L. Zhang (2002). Hierarchical latent class models for cluster analysis. Journal of Machine Learning Research , to appear. Outline. Motivation Application of LCA in medicine

akio
Download Presentation

Lecture 15: Hierarchical Latent Class Models Based ON

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. COMP 538 Introduction of Bayesian networks Lecture 15: Hierarchical Latent Class Models Based ON N. L. Zhang (2002). Hierarchical latent class models for cluster analysis. Journal of Machine Learning Research, to appear.

  2. Outline • Motivation • Application of LCA in medicine • Model-based clustering and TCM diagnosis • Need of more general models • Theoretical Issues • Learning Algorithm • Empirical Results • Related work

  3. Motivations/LCA in Medicine • In medical diagnosis, sometimes gold standard exists • Example: Lung Cancer • Symptoms: • Persistent cough, Hemoptysis (Coughing up blood), Constant chest pain, Shortness of breath, Fatigue, etc • Information for diagnosis: symptoms, medical history, smoking history, X-ray, sputum. • Gold standard: • Biopsy: the removal of a small sample of tissue for examination under a microscope by a pathologist

  4. Motivations/LCA in Medicine • Sometimes gold standard does not exist • Example: Rheumatoid Arthritis (RA) • Symptoms: Back Pain, Neck Pain, Joint Pain, Joint Swelling, Morning Joint Stiffness, etc • Information for diagnosis: • Symptoms, medical history, physical exam, • Lab tests including a test for rheumatoid factor. (Rheumatoid factor is an antibody found in the blood of about 80 percent of adults with RA. ) • No gold standard: • None of the symptoms or their combinations are not clear-cut indicators of RA • The presence or absence of rheumatoid factor does not indicate that one has RA.

  5. Motivations/LCA in Medicine • Questions: • How many diagnostic categories there should be? • What rules to use when making diagnosis? • Note: These questions cannot be answered using regression (supervised learning) because • The true “disease type” is never directly observed. It is latent. • Ideas: • Each “disease type” must correspond to a cluster of people. • People in different clusters demonstrate different symptom patterns (otherwise diagnosis is hopeless) • Possible solution: Perform cluster analysis of symptom data to reveal patterns.

  6. X Yp Y1 Y2 Motivations/LCA in Medicine • Latent class analysis (LCA) • Cluster analysis based on the latent class (LC) model • Observed variables Y_j: symptoms • Latent variable X: “disease type” • Assumption: • Y_j’s independent of each other given X • Given: Data on Y_j • Determine: • Number of states for X • Prevalence: P(X) • Class specific probability P(Y_j|X)

  7. Motivations/LCA in Medicine • LC Analysis of Hannover Rheumatoid Arthritis Data Class specific probabilities • Cluster 1: “disease” free • Cluster 2: “back-pain type” • Cluster 3: “Joint type” • Cluster 4: “Severe type”

  8. Model-Based Clustering and TCM diagnosis • Diagnosis in traditional Chinese Medicine (TCM) • Example: deficiency of kidney(肾虚), • Symptoms: lassitude in the loins (腰酸软而痛), tinnitus(耳鸣), driping urine (小便余沥不尽), etc • Similar to Rheumatoid Arthritis • Diagnosis based on symptoms • No gold standards exist

  9. Model-Based Clustering and TCM diagnosis • Current status • Researcher have been searching for laboratory indices that can serve as gold standards. • All such effort failed. • In practice, quite subjective. Differ considerably between doctors. • Hindering practices and preventing international recognition.

  10. Model-Based Clustering and TCM diagnosis • How to lay TCM diagnosis on a scientific foundation? • Model-based cluster analysis • Statistical methods might be the answer: • TCM diagnosis based on experiences (by contemporary practitioners and ancient doctors) • Experiences are summaries of patient cases. • Summarizing patient cases by humans braining leads to subjectivity. • Summarizing patient cases by computer avoids subjectivity.

  11. Need of more general Models • Preliminary analysis of TCM data using LCA: • Could not find models that fit data well • Reason: latent class (LC) models are too simplistic • Local independence: Observed variables mutually independent given the latent variable • Need: more realistic models

  12. Need of more general Models • Hierarchical latent class (HLC) models: • Tree structured Bayesian networks, where • Leaf nodes are observed and others are not • Manifest variables = observed variables • Maybe still too simplistic, but a good first step • More general than LC models • Nice computational properties • Task: • Learn HLC models from data • Learn latent structures from what we can observe.

  13. Theoretical Issues • What latent structures can be learned from data? • An HLC model M is parsimonious if there does NOT exist another model M' that • Is marginally equivalent to M, and P(manifest vars|M) = P(manifest vars|M’) • Has fewer independent parameters than M. • Occam’s razor prefers parsimonious models over non-parsimonious ones

  14. Theoretical Issues • Regular HLC models • HLC model is regular if for any latent node Z with neighbors X1, X2, …, Xk where strict inequality hold when there are only two neighbors • Irregular models are not parsimonious. (Operational characterization of parsimony) • The set of all possible regular HLC models for a given set of manifest variables is finite. (Finite search space for learning algorithm.)

  15. Theoretical Issues • Model Equivalence • Root walking • M1: root walks to X2 • M2: root walks to X3 • Root walking leads to equivalent models

  16. Theoretical Issues • Unrooted HLC models • The root of an HLC model can walk to any latent node. • Unrooted model: HLC models with undirected edges. • We can only learn unrooted models. • Question: which latent node should be the class node? • Answer: Any, depending on semantics and purpose of clustering. Learn one model for multiple clustering.

  17. Theoretical Issues • Measure of model complexity • When no latent variables: number of free parameters (standard dimension) • When latent variables: effective dimension instead • P(Y1, Y2, …, Yn) spans 2n –1 dimensional space S if no constraints. • HLC model imposes some constraints on the joint • It spans a subspace of S • Effective dimension of model: dimension of S. HARD to compute.

  18. Theoretical Issues • Reduction Theorem for regular HLC models (Kocka and Zhang 2002): • D(M) = D(M1) + D(M2) – number of common parameters • Problem reduces to: effective dimension of LC models. Good approximation exists.

  19. Theoretical Issues • Example • Standard dimension: 110 • Effective dimension: 61

  20. Learning HLC Models • Given: i.i.d. samples generated by some regular HLC model. • Task: Reconstruct the HLC model from data. • Hill-climbing algorithm • Scoring metric: We experiment with AIC,BIC, CS, Holdout LS (yet to run experiments with effective dimension) • Search space: • Set of all possible regular HLC models for the given manifest variables. • We structure the space into two levels according to two subtasks • Given a model structure, estimate cardinalities of latent variables. • Find a optimal model structure.

  21. Learning HLC Models • Estimate cardinalities of latent variables given model structure • Search space: All regular models with the given model structure. • Hill-climbing: • Start: All latent variables have minimum cardinality (usually 2) • Search operator: Increate the cardinality of one latent variable by one

  22. Learning HLC Models • Find optimal model structures • Search space: Set of all regular unrooted HLC model structures for the given manifest variables. • Hill-Climbing: • Start: unrooted LC model structure • Search operators: • Node introduction, Node elimination, Neighbor relocation • Can go between any two model structures using those operators.

  23. Learning HLC Models • Motivations for search Operators: • Node introduction: M1’  M2’. Deal with local dependence. • Opposite: Node elimination. • Neighbor relocation: M2’  M3’. Result of tradeoff. • Opposite. Itself. • Not allowed to yield irregular model structures.

  24. Empirical Results • Synthetic data: • Generative model, randomly parameterized • All latent variables have 3 states. • Sample sizes: 5k, 10k, 50k, 100k • Log scores on testing data • Close to that of generative model • Do not vary much across scoring metrics.

  25. Empirical Results • Learned structures: Numbers of steps to true structure

  26. Empirical Results • Cardinality of Latent variables • Better results with more skewed parameters

  27. Empirical Results • Hannover Rheumatoid Arthritis data: • 5 binary manifest variables: back pain, neck pain, joint swelling, … • 7,162 records • Analysis by Kohlmann and Formann (1997): 4 class LC model. • Our algorithm: exactly the same model. • Coleman data • 4 binary manifest variables, 3,398 records. • Analysis by Goodman (1974) and Hagenaars (1988): M1, M2 • Our algorithm: M3

  28. Empirical Results • HIV data • 4 binary manifest variables, 428 records • Analysis by Uebesax (2000):  • Our algorithm:  • House Building data • 4 binary manifest variable, 1185 Records • Analysis by Hagenaars (1988): M2, M3, M4 • Our algorithm: 4 class LC model, fits data poorly. A failure. Reason: limitation of HLC models

  29. AAGGCCT AAGACTT AGCACTT AGCACAA AGGGCAT TAGACTT AGCGCTT TAGCCCA Related Work • Phylogenetic trees: • Represent relationship between a set of species. • Probabilistic model: • Taxa aligned, sites evolves i.i.d • Conditional probs: character evolution model. Parameters: edge lengths, representing time. • Restricted to one site, a phylogenetic tree is a HLC model where • Binary tree structure, same state space for all vars. • The conditional probabilities are parameterized by edge lengths • The model is the same for different sites

  30. Related Work • Tree reconstruction: • Given: current taxa. Find: tree topology and edge lengths. • Methods • Hill-climbing • Stepwise addition of sites • Star decomposition, similar to node introduction in HLC models. • Branch swapping, similar to neighbor relocation in HLC models • Structural EM (Friedman et al 2002): • Use fact: All vars have same state space • Neighbor joining (Saitou & Nei, 1987): • Use facts: parameters = edge lengths, additivity.

  31. Related Work • Connolly (1993): • Heuristic method for constructing HLC models • Mutual information used to group variables • One latent variable introduced for each group. • Cardinalities of latent variables determined using conceptual clustering • Martin and VanLehn (1994): • Heuristic method for learning two-level Bayesian network where the top level is latent. • Elidan et al. (2001): • Learning latent variables for general Bayesian networks. • Aim: Simplification. Idea: Structural signature. • Model-based hierarchical clustering (Hansen et al. 1991): • Hierarchical the state space for ONE cluster variable.

  32. Related Work • Diagnostics for local dependence in LC models: • Hagenaars (1988): • Standardized residual • Espeland & Handelmann (1988) • Likelihood ratio statistic • Garret & Zeger (2000) • Log odds ratio • Modeling local dependence in LC models • Joint variable (M2), multiple indicator (M3), loglinear model (M4)

More Related