1 / 55

View Learning: An extension to SRL An application in Mammography

View Learning: An extension to SRL An application in Mammography . Jesse Davis, Beth Burnside, Inês Dutra Vítor Santos Costa, David Page, Jude Shavlik & Raghu Ramakrishnan. Background. Breast cancer is the most common cancer Mammography is the only proven screening test

amiel
Download Presentation

View Learning: An extension to SRL An application in Mammography

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. View Learning: An extension to SRL An application in Mammography Jesse Davis, Beth Burnside, Inês Dutra Vítor Santos Costa, David Page, Jude Shavlik & Raghu Ramakrishnan

  2. Background • Breast cancer is the most common cancer • Mammography is the only proven screening test • At this time approximately 61% of women have had a mammogram in the last 2 years • Translates into 20 million mammograms per year

  3. The Problem • Radiologists interpret mammograms • Variability in among radiologists differences in training and experience • Experts have higher cancer detection and less benign biopsies • Shortage of experts

  4. Common Mammography findings • Microcalcifications • Masses • Architectural distortion

  5. Calcifications

  6. Mass

  7. Architectural distortion

  8. Other important features • Microcalcifications • Shape, distribution, stability • Masses • Shape, margin, density, size, stability • Associated findings • Breast Density

  9. Other variables influence risk • Demographic risk factors • Family History • Hormone therapy • Age

  10. Standardization of Practice -Passage of the Mammography Quality Standards Act (MQSA) in 1992 -Requires tracking of patient outcomes through regular audits of mammography interpretations and cases of breast cancer -Standardized lexicon: BI-RADS was developed incorporating 5 categories that include 43 unique descriptors

  11. BI-RADS Margins -circumscribed -microlobulated -obscured -indistinct -Spiculated Special Cases Mass Associated Findings Skin Thickening Shape -round -oval -lobular -irregular Tubular Density Density -high -equal -low -fat containing Skin Lesion Lymph Node Architectural Distortion Trabecular Thickening Assymetric Breast Tissue Calcifications Typically Benign -skin -vascular -coarse/popcorn -rod-like -round -lucent-centered -eggshell/rim -milk of calcium -suture -dystrophic -punctate Distribution -clustered -linear -segmental -regional -diffuse/scattered Nipple Retraction Focal Assymetric Density Intermediate -amorphous Axillary Adenopathy Skin Retraction Higher Probability Malignancy -pleomorphic -fine/linear/branching

  12. Mammography Database • Radiologist interpretation of mammogram • Patient may have multiple mammograms • A mammogram may have multiple abnormalities • Expert defined Bayes net for determining whether an abnormality is malignant

  13. Original Expert Structure

  14. Patient Abnormality Date Mass Shape … Mass Size Loc Be/Mal P1 1 5/02 Spic 0.03 RU4 B P1 2 5/04 Var 0.04 RU4 M P1 3 5/04 Spic 0.04 LL3 B … … … … … … … Mammography Database

  15. Types of Learning • Hierarchy of ‘types’ of learning that we can perform on the Mammography database

  16. Be/Mal Shape Size Level 1: Parameters Given: Features (node labels, or fields in database), Data, Bayes net structure Learn: Probabilities. Note: probabilities needed are Pr(Be/Mal), Pr(Shape|Be/Mal), Pr (Size|Be/Mal)

  17. Level 2: Structure Be/Mal Given: Features, Data Learn: Bayes net structure and probabilities. Note: with this structure, now will need Pr(Size|Shape,Be/Mal) instead of Pr(Size|Be/Mal). Shape Size

  18. Patient Abnormality Date Mass Shape … Mass Size Loc Be/Mal P1 1 5/02 Spic 0.03 RU4 B P1 2 5/04 Var 0.04 RU4 M P1 3 5/04 Spic 0.04 LL3 B … … … … … … … Mammography Database

  19. Patient Abnormality Date Mass Shape … Mass Size Loc Be/Mal P1 1 5/02 Spic 0.03 RU4 B P1 2 5/04 Var 0.04 RU4 M P1 3 5/04 Spic 0.04 LL3 B … … … … … … … Mammography Database

  20. Patient Abnormality Date Mass Shape … Mass Size Loc Be/Mal P11 5/02 Spic 0.03 RU4 B P12 5/04 Var 0.04 RU4M P13 5/04 Spic 0.04 LL3 B … … … … … … … Mammography Database

  21. Level 3: Aggregates Given: Features, Data, Background knowledge – aggregation functions such as average, mode, max, etc. Learn: Useful aggregate features, Bayes net structure that uses these features, and probabilities. New featuresmay use other rows/tables. Avg size this date Be/Mal Size Shape

  22. Patient Abnormality Date Mass Shape … Mass Size Loc Be/Mal P1 1 5/02 Spic 0.03 RU4 B P1 2 5/04 Var 0.04 RU4 M P1 3 5/04 Spic 0.04 LL3 B … … … … … … … Mammography Database

  23. Patient Abnormality Date Mass Shape … Mass Size Loc Be/Mal P1 1 5/02 Spic 0.03 RU4 B P1 2 5/04 Var 0.04 RU4 M P1 3 5/04 Spic 0.04 LL3 B … … … … … … … Mammography Database

  24. Patient Abnormality Date Mass Shape … Mass Size Loc Be/Mal P1 15/02Spic 0.03RU4B P1 2 5/04 Var 0.04 RU4 M P1 3 5/04 Spic 0.04 LL3 B … … … … … … … Mammography Database

  25. Level 4: View Learning Given: Features, Data, Background knowledge – aggregation functions and intensionally-defined relations such as “increase” or “same location” Learn: Usefulnew features defined by views (equivalent to rules or SQL queries),Bayes net structure, and probabilities. Shape change in abnormality at this location Increase in average size of abnormalities Avg size this date Be/Mal Shape Size

  26. Structure Learning Algorithms • Three different algorithms • Naïve Bayes • Tree Augmented Naïve Bayes (TAN) • Sparse Candidate Algorithm

  27. Class Value Attr 1 Attr 2 Attr 3 Attr N-2 Attr N-1 Attr N Naïve Bayes Net • Simple, computationally efficient …

  28. Attr 1 Attr 2 Attr 3 Attr N-2 Attr N-1 Attr N Example TAN Net • Also computationally efficient [Friedman,Geiger & Goldszmidt ‘97] Class Value …

  29. TAN • Arc from class variable to each attribute • Less Restrictive than Naïve Bayes • Each attribute permitted at most one extra parent • Polynomial time bound on constructing network • O((# attributes)2 * |training set|) • Guaranteed to maximize LL(BT | D)

  30. TAN Algorithm • Constructs a complete graph between all the attributes (excluding class variable) • Edge weight is conditional mutual information between the vertices • Find maximum weight spanning tree over the graph • Pick root in tree and make edges directed • Add edges from directed tree to network

  31. Attr 2 Class Value Attr N Attr 1 Attr 3 Attr N-1 Attr N-3 Attr N-2 General Bayes Net

  32. Sparse Candidate • Friedman et al ‘97 • No restrictions on directionality of arcs for class attribute • Limits possible parents for each node to a small “candidate” set

  33. Sparse Candidate Algorithm • Greedy hill climbing search with restarts • Initial structure is empty graph • Score graph using BDe metric (Cooper & Herskovits ’92, Heckerman ’96) • Selects candidate set using an information metric • Re-estimate candidate set after each restart

  34. Sparse Candidate Algorithm • We looked at several initial structures • Expert structure • Naïve Bayes • TAN • Scored network on tune set accuracy

  35. Our Initial Approach for Level 4 • Use ILP to learn rules predictive of “malignant” • Treat the rules as intensional definitions of new fields • The new view consists of the original table extended with the new fields

  36. Using Views malignant(A) :- massesStability(A,increasing), prior_mammogram(A,B,_), H0_BreastCA(B,hxDCorLC).

  37. Sample Rule malignant(A) :- BIRADS_category(A,b5), MassPAO(A,present), MassesDensity'(A,high), HO_BreastCA(A,hxDCorLC), in_same_mammogram(A,B), Calc_Pleomorphic(B,notPresent), Calc_Punctate(B,notPresent).

  38. Methodology • 10 fold cross validation • Split at the patient level • Roughly 40 malignant cases and 6000 benign cases in each fold

  39. Methodology • Without the ILP rules • 6 folds for training set • 3 folds for tuning set • With ILP • 4 folds to learn ILP rules • 3 folds for training set • 2 folds for tuning set • TAN/Naïve Bayes don’t require tune set

  40. Evaluation • Precision and recall curves • Why not ROC curves? • With many negatives ROC curves look overly optimistic • Large change in number of false positives yields small change in ROC curve • Pooled results over all 10 folds

  41. ROC: Level 2 (TAN) vs. Level 1

  42. Precision-Recall Curves

  43. Related Work: ILP for Feature Construction • Pompe & Kononenko, ILP’95 • Srinivasan & King, ILP’97 • Perlich & Provost, KDD’03 • Neville, Jensen, Friedland and Hay, KDD’03

  44. Ways to Improve Performance • Learn rules to predict “benign” as well as “malignant.” • Use Gleaner (Goadrich, Oliphant & Shavlik, ILP’04) to get better spread of Precision vs. Recall in the learned rules. • Incorporate aggregation into the ILP runs themselves.

  45. Richer View Learning Approaches • Learn rules predictive of other fields. • Use WARMR or other first-order clustering approaches. • Integrate Structure Learning and View Learning…score a rule by how much it helps the current model when added

  46. Level 4: View Learning Given: Features, Data, Background knowledge – aggregation functions and intensionally-defined relations such as “increase” or “same location” Learn: Usefulnew features defined by views (equivalent to rules or SQL queries),Bayes net structure, and probabilities. Shape change in abnormality at this location Increase in average size of abnormalities Avg size this date Be/Mal Shape Size

  47. Integrated View/Structure Learning sc(X):- id(X,P), id(Y,P), loc(X,L), loc(Y,L), date(Y,D1), date(X,D2), before(D1,D2), shape(X,Sh1), shape(Y,Sh2), Sh1 \= Sh2. Increase in average size of abnormalities Avg size this date Be/Mal Shape Size

  48. Integrated View/Structure Learning sc(X):- id(X,P), id(Y,P), loc(X,L), loc(Y,L), date(Y,D1), date(X,D2), before(D1,D2), shape(X,Sh1), shape(Y,Sh2), Sh1 \= Sh2, size(X,S1), size(Y,S2), S1 > S2. Increase in average size of abnormalities Avg size this date Be/Mal Shape Size

More Related