550 likes | 740 Views
View Learning: An extension to SRL An application in Mammography . Jesse Davis, Beth Burnside, Inês Dutra Vítor Santos Costa, David Page, Jude Shavlik & Raghu Ramakrishnan. Background. Breast cancer is the most common cancer Mammography is the only proven screening test
E N D
View Learning: An extension to SRL An application in Mammography Jesse Davis, Beth Burnside, Inês Dutra Vítor Santos Costa, David Page, Jude Shavlik & Raghu Ramakrishnan
Background • Breast cancer is the most common cancer • Mammography is the only proven screening test • At this time approximately 61% of women have had a mammogram in the last 2 years • Translates into 20 million mammograms per year
The Problem • Radiologists interpret mammograms • Variability in among radiologists differences in training and experience • Experts have higher cancer detection and less benign biopsies • Shortage of experts
Common Mammography findings • Microcalcifications • Masses • Architectural distortion
Other important features • Microcalcifications • Shape, distribution, stability • Masses • Shape, margin, density, size, stability • Associated findings • Breast Density
Other variables influence risk • Demographic risk factors • Family History • Hormone therapy • Age
Standardization of Practice -Passage of the Mammography Quality Standards Act (MQSA) in 1992 -Requires tracking of patient outcomes through regular audits of mammography interpretations and cases of breast cancer -Standardized lexicon: BI-RADS was developed incorporating 5 categories that include 43 unique descriptors
BI-RADS Margins -circumscribed -microlobulated -obscured -indistinct -Spiculated Special Cases Mass Associated Findings Skin Thickening Shape -round -oval -lobular -irregular Tubular Density Density -high -equal -low -fat containing Skin Lesion Lymph Node Architectural Distortion Trabecular Thickening Assymetric Breast Tissue Calcifications Typically Benign -skin -vascular -coarse/popcorn -rod-like -round -lucent-centered -eggshell/rim -milk of calcium -suture -dystrophic -punctate Distribution -clustered -linear -segmental -regional -diffuse/scattered Nipple Retraction Focal Assymetric Density Intermediate -amorphous Axillary Adenopathy Skin Retraction Higher Probability Malignancy -pleomorphic -fine/linear/branching
Mammography Database • Radiologist interpretation of mammogram • Patient may have multiple mammograms • A mammogram may have multiple abnormalities • Expert defined Bayes net for determining whether an abnormality is malignant
Patient Abnormality Date Mass Shape … Mass Size Loc Be/Mal P1 1 5/02 Spic 0.03 RU4 B P1 2 5/04 Var 0.04 RU4 M P1 3 5/04 Spic 0.04 LL3 B … … … … … … … Mammography Database
Types of Learning • Hierarchy of ‘types’ of learning that we can perform on the Mammography database
Be/Mal Shape Size Level 1: Parameters Given: Features (node labels, or fields in database), Data, Bayes net structure Learn: Probabilities. Note: probabilities needed are Pr(Be/Mal), Pr(Shape|Be/Mal), Pr (Size|Be/Mal)
Level 2: Structure Be/Mal Given: Features, Data Learn: Bayes net structure and probabilities. Note: with this structure, now will need Pr(Size|Shape,Be/Mal) instead of Pr(Size|Be/Mal). Shape Size
Patient Abnormality Date Mass Shape … Mass Size Loc Be/Mal P1 1 5/02 Spic 0.03 RU4 B P1 2 5/04 Var 0.04 RU4 M P1 3 5/04 Spic 0.04 LL3 B … … … … … … … Mammography Database
Patient Abnormality Date Mass Shape … Mass Size Loc Be/Mal P1 1 5/02 Spic 0.03 RU4 B P1 2 5/04 Var 0.04 RU4 M P1 3 5/04 Spic 0.04 LL3 B … … … … … … … Mammography Database
Patient Abnormality Date Mass Shape … Mass Size Loc Be/Mal P11 5/02 Spic 0.03 RU4 B P12 5/04 Var 0.04 RU4M P13 5/04 Spic 0.04 LL3 B … … … … … … … Mammography Database
Level 3: Aggregates Given: Features, Data, Background knowledge – aggregation functions such as average, mode, max, etc. Learn: Useful aggregate features, Bayes net structure that uses these features, and probabilities. New featuresmay use other rows/tables. Avg size this date Be/Mal Size Shape
Patient Abnormality Date Mass Shape … Mass Size Loc Be/Mal P1 1 5/02 Spic 0.03 RU4 B P1 2 5/04 Var 0.04 RU4 M P1 3 5/04 Spic 0.04 LL3 B … … … … … … … Mammography Database
Patient Abnormality Date Mass Shape … Mass Size Loc Be/Mal P1 1 5/02 Spic 0.03 RU4 B P1 2 5/04 Var 0.04 RU4 M P1 3 5/04 Spic 0.04 LL3 B … … … … … … … Mammography Database
Patient Abnormality Date Mass Shape … Mass Size Loc Be/Mal P1 15/02Spic 0.03RU4B P1 2 5/04 Var 0.04 RU4 M P1 3 5/04 Spic 0.04 LL3 B … … … … … … … Mammography Database
Level 4: View Learning Given: Features, Data, Background knowledge – aggregation functions and intensionally-defined relations such as “increase” or “same location” Learn: Usefulnew features defined by views (equivalent to rules or SQL queries),Bayes net structure, and probabilities. Shape change in abnormality at this location Increase in average size of abnormalities Avg size this date Be/Mal Shape Size
Structure Learning Algorithms • Three different algorithms • Naïve Bayes • Tree Augmented Naïve Bayes (TAN) • Sparse Candidate Algorithm
Class Value Attr 1 Attr 2 Attr 3 Attr N-2 Attr N-1 Attr N Naïve Bayes Net • Simple, computationally efficient …
Attr 1 Attr 2 Attr 3 Attr N-2 Attr N-1 Attr N Example TAN Net • Also computationally efficient [Friedman,Geiger & Goldszmidt ‘97] Class Value …
TAN • Arc from class variable to each attribute • Less Restrictive than Naïve Bayes • Each attribute permitted at most one extra parent • Polynomial time bound on constructing network • O((# attributes)2 * |training set|) • Guaranteed to maximize LL(BT | D)
TAN Algorithm • Constructs a complete graph between all the attributes (excluding class variable) • Edge weight is conditional mutual information between the vertices • Find maximum weight spanning tree over the graph • Pick root in tree and make edges directed • Add edges from directed tree to network
Attr 2 Class Value Attr N Attr 1 Attr 3 Attr N-1 Attr N-3 Attr N-2 General Bayes Net
Sparse Candidate • Friedman et al ‘97 • No restrictions on directionality of arcs for class attribute • Limits possible parents for each node to a small “candidate” set
Sparse Candidate Algorithm • Greedy hill climbing search with restarts • Initial structure is empty graph • Score graph using BDe metric (Cooper & Herskovits ’92, Heckerman ’96) • Selects candidate set using an information metric • Re-estimate candidate set after each restart
Sparse Candidate Algorithm • We looked at several initial structures • Expert structure • Naïve Bayes • TAN • Scored network on tune set accuracy
Our Initial Approach for Level 4 • Use ILP to learn rules predictive of “malignant” • Treat the rules as intensional definitions of new fields • The new view consists of the original table extended with the new fields
Using Views malignant(A) :- massesStability(A,increasing), prior_mammogram(A,B,_), H0_BreastCA(B,hxDCorLC).
Sample Rule malignant(A) :- BIRADS_category(A,b5), MassPAO(A,present), MassesDensity'(A,high), HO_BreastCA(A,hxDCorLC), in_same_mammogram(A,B), Calc_Pleomorphic(B,notPresent), Calc_Punctate(B,notPresent).
Methodology • 10 fold cross validation • Split at the patient level • Roughly 40 malignant cases and 6000 benign cases in each fold
Methodology • Without the ILP rules • 6 folds for training set • 3 folds for tuning set • With ILP • 4 folds to learn ILP rules • 3 folds for training set • 2 folds for tuning set • TAN/Naïve Bayes don’t require tune set
Evaluation • Precision and recall curves • Why not ROC curves? • With many negatives ROC curves look overly optimistic • Large change in number of false positives yields small change in ROC curve • Pooled results over all 10 folds
Related Work: ILP for Feature Construction • Pompe & Kononenko, ILP’95 • Srinivasan & King, ILP’97 • Perlich & Provost, KDD’03 • Neville, Jensen, Friedland and Hay, KDD’03
Ways to Improve Performance • Learn rules to predict “benign” as well as “malignant.” • Use Gleaner (Goadrich, Oliphant & Shavlik, ILP’04) to get better spread of Precision vs. Recall in the learned rules. • Incorporate aggregation into the ILP runs themselves.
Richer View Learning Approaches • Learn rules predictive of other fields. • Use WARMR or other first-order clustering approaches. • Integrate Structure Learning and View Learning…score a rule by how much it helps the current model when added
Level 4: View Learning Given: Features, Data, Background knowledge – aggregation functions and intensionally-defined relations such as “increase” or “same location” Learn: Usefulnew features defined by views (equivalent to rules or SQL queries),Bayes net structure, and probabilities. Shape change in abnormality at this location Increase in average size of abnormalities Avg size this date Be/Mal Shape Size
Integrated View/Structure Learning sc(X):- id(X,P), id(Y,P), loc(X,L), loc(Y,L), date(Y,D1), date(X,D2), before(D1,D2), shape(X,Sh1), shape(Y,Sh2), Sh1 \= Sh2. Increase in average size of abnormalities Avg size this date Be/Mal Shape Size
Integrated View/Structure Learning sc(X):- id(X,P), id(Y,P), loc(X,L), loc(Y,L), date(Y,D1), date(X,D2), before(D1,D2), shape(X,Sh1), shape(Y,Sh2), Sh1 \= Sh2, size(X,S1), size(Y,S2), S1 > S2. Increase in average size of abnormalities Avg size this date Be/Mal Shape Size