1 / 41

Nevin L. Zhang Room 3504, phone: 2358-7015, Email: lzhang@cst.hk Home page

THE HONG KONG UNIVERSITY OF SCIENCE & TECHNOLOGY CSIT 5220:  Reasoning and Decision under Uncertainty L09: Model-Based Classification and Clustering. Nevin L. Zhang Room 3504, phone: 2358-7015, Email: lzhang@cs.ust.hk Home page. Probabilistic Models (PMs) for Classification

alissa
Download Presentation

Nevin L. Zhang Room 3504, phone: 2358-7015, Email: lzhang@cst.hk Home page

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. THE HONG KONG UNIVERSITY OF SCIENCE & TECHNOLOGYCSIT 5220:  Reasoning and Decision under Uncertainty L09: Model-Based Classification and Clustering Nevin L. ZhangRoom 3504, phone: 2358-7015, Email: lzhang@cs.ust.hkHome page

  2. Probabilistic Models (PMs) for Classification PMs for Clustering L09: Model-Based Classification and Clustering

  3. The problem: Given data: Find mapping (A1, A2, …, An) |- C Possible solutions ANN Decision tree (Quinlan) … (SVM: Continuous data) Classification

  4. Probabilistic Approach to Classification

  5. Will Boss Play Tennis?

  6. Will Boss Play Tennis?

  7. Bayesian Networks for Classification • Naïve Bayes model often has good performance in practice • Drawbacks of Naïve Bayes: • Attributes mutually independent given class variable • Often violated, leading to double counting. • Fixes: • General BN classifiers • Tree augmented Naïve Bayes (TAN) models • …

  8. Bayesian Networks for Classification • General BN classifier • Treat class variable just as another variable • Learn a BN. • Classify the next instance based on values of variables in the Markov blanket of the class variable. • Pretty bad because it does not utilize all available information because of Markov boundary

  9. Bayesian Networks for Classification • Tree-Augmented Naïve Bayes (TAN) model • Capture dependence among attributes using a tree structure. • During learning, • First learn a tree among attributes: use Chow-Liu algorithm • Special structure learning problem, easy • Add class variable and estimate parameters • Classification • arg max_c P(C=c|A1=a1, …, An=an) • BN inference • Many other methods

  10. Task: Find a tree model over observed variables that has maximum likelihood given data. Maximized loglikelihood Chow-Liu Trees

  11. Mutual Information Chow-Liu Trees • Task is equivalent to finding maximum spanning tree of the following weighted and undirected graph:

  12. Maximum Spanning Trees

  13. http://www.cs.cmu.edu/~guestrin/Class/15781/recitations/r10/11152007chowliu.pdfhttp://www.cs.cmu.edu/~guestrin/Class/15781/recitations/r10/11152007chowliu.pdf Illustration of Kruskal’s Algorithm

  14. Probabilistic Models (PMs) for Classification PMs for Clustering L09: Probabilistic Models (PMs) for Classification and Clustering

  15. An Medical Application • In medical diagnosis, sometimes gold standard exists • Example: Lung Cancer • Symptoms: • Persistent cough, Hemoptysis (Coughing up blood), Constant chest pain, Shortness of breath, Fatigue, etc • Information for diagnosis: symptoms, medical history, smoking history, X-ray, sputum. • Gold standard: • Biopsy: the removal of a small sample of tissue for examination under a microscope by a pathologist

  16. An Medical Application • Sometimes gold standard does not exist • Example: Rheumatoid Arthritis (RA) • Symptoms: Back Pain, Neck Pain, Joint Pain, Joint Swelling, Morning Joint Stiffness, etc • Information for diagnosis: • Symptoms, medical history, physical exam, • Lab tests including a test for rheumatoid factor. • (Rheumatoid factor is an antibody found in the blood of about 80 percent of adults with RA. ) • No gold standard: • None of the symptoms or their combinations are not clear-cut indicators of RA • The presence or absence of rheumatoid factor does not indicate that one has RA.

  17. LC Analysis of Hannover Rheumatoid Arthritis Data • Class specific probabilities • Cluster 1: “disease” free • Cluster 2: “back-pain type” • Cluster 3: “Joint type” • Cluster 4: “Severe type”

  18. To Cluster Continuous Data

  19. Learning Gaussian Mixture Models

  20. http://www.socr.ucla.edu/Applets.dir/MixtureEM.html

More Related