View Learning: An extension to SRL An application in Mammography

View Learning: An extension to SRL An application in Mammography Jesse Davis, Beth Burnside, Inês Dutra Vítor Santos Costa, David Page, Jude Shavlik & Raghu Ramakrishnan

Background • Breast cancer is the most common cancer • Mammography is the only proven screening test • At this time approximately 61% of women have had a mammogram in the last 2 years • Translates into 20 million mammograms per year

The Problem • Radiologists interpret mammograms • Variability in among radiologists differences in training and experience • Experts have higher cancer detection and less benign biopsies • Shortage of experts

Common Mammography findings • Microcalcifications • Masses • Architectural distortion

Calcifications

Mass

Architectural distortion

Other important features • Microcalcifications • Shape, distribution, stability • Masses • Shape, margin, density, size, stability • Associated findings • Breast Density

Other variables influence risk • Demographic risk factors • Family History • Hormone therapy • Age

Standardization of Practice -Passage of the Mammography Quality Standards Act (MQSA) in 1992 -Requires tracking of patient outcomes through regular audits of mammography interpretations and cases of breast cancer -Standardized lexicon: BI-RADS was developed incorporating 5 categories that include 43 unique descriptors

BI-RADS Margins -circumscribed -microlobulated -obscured -indistinct -Spiculated Special Cases Mass Associated Findings Skin Thickening Shape -round -oval -lobular -irregular Tubular Density Density -high -equal -low -fat containing Skin Lesion Lymph Node Architectural Distortion Trabecular Thickening Assymetric Breast Tissue Calcifications Typically Benign -skin -vascular -coarse/popcorn -rod-like -round -lucent-centered -eggshell/rim -milk of calcium -suture -dystrophic -punctate Distribution -clustered -linear -segmental -regional -diffuse/scattered Nipple Retraction Focal Assymetric Density Intermediate -amorphous Axillary Adenopathy Skin Retraction Higher Probability Malignancy -pleomorphic -fine/linear/branching

Mammography Database • Radiologist interpretation of mammogram • Patient may have multiple mammograms • A mammogram may have multiple abnormalities • Expert defined Bayes net for determining whether an abnormality is malignant

Original Expert Structure

Patient Abnormality Date Mass Shape … Mass Size Loc Be/Mal P1 1 5/02 Spic 0.03 RU4 B P1 2 5/04 Var 0.04 RU4 M P1 3 5/04 Spic 0.04 LL3 B … … … … … … … Mammography Database

Types of Learning • Hierarchy of ‘types’ of learning that we can perform on the Mammography database

Be/Mal Shape Size Level 1: Parameters Given: Features (node labels, or fields in database), Data, Bayes net structure Learn: Probabilities. Note: probabilities needed are Pr(Be/Mal), Pr(Shape|Be/Mal), Pr (Size|Be/Mal)

Level 2: Structure Be/Mal Given: Features, Data Learn: Bayes net structure and probabilities. Note: with this structure, now will need Pr(Size|Shape,Be/Mal) instead of Pr(Size|Be/Mal). Shape Size

Patient Abnormality Date Mass Shape … Mass Size Loc Be/Mal P11 5/02 Spic 0.03 RU4 B P12 5/04 Var 0.04 RU4M P13 5/04 Spic 0.04 LL3 B … … … … … … … Mammography Database

Level 3: Aggregates Given: Features, Data, Background knowledge – aggregation functions such as average, mode, max, etc. Learn: Useful aggregate features, Bayes net structure that uses these features, and probabilities. New featuresmay use other rows/tables. Avg size this date Be/Mal Size Shape

Patient Abnormality Date Mass Shape … Mass Size Loc Be/Mal P1 15/02Spic 0.03RU4B P1 2 5/04 Var 0.04 RU4 M P1 3 5/04 Spic 0.04 LL3 B … … … … … … … Mammography Database

Level 4: View Learning Given: Features, Data, Background knowledge – aggregation functions and intensionally-defined relations such as “increase” or “same location” Learn: Usefulnew features defined by views (equivalent to rules or SQL queries),Bayes net structure, and probabilities. Shape change in abnormality at this location Increase in average size of abnormalities Avg size this date Be/Mal Shape Size

Structure Learning Algorithms • Three different algorithms • Naïve Bayes • Tree Augmented Naïve Bayes (TAN) • Sparse Candidate Algorithm

Class Value Attr 1 Attr 2 Attr 3 Attr N-2 Attr N-1 Attr N Naïve Bayes Net • Simple, computationally efficient …

Attr 1 Attr 2 Attr 3 Attr N-2 Attr N-1 Attr N Example TAN Net • Also computationally efficient [Friedman,Geiger & Goldszmidt ‘97] Class Value …

TAN • Arc from class variable to each attribute • Less Restrictive than Naïve Bayes • Each attribute permitted at most one extra parent • Polynomial time bound on constructing network • O((# attributes)2 * |training set|) • Guaranteed to maximize LL(BT | D)

TAN Algorithm • Constructs a complete graph between all the attributes (excluding class variable) • Edge weight is conditional mutual information between the vertices • Find maximum weight spanning tree over the graph • Pick root in tree and make edges directed • Add edges from directed tree to network

Attr 2 Class Value Attr N Attr 1 Attr 3 Attr N-1 Attr N-3 Attr N-2 General Bayes Net

Sparse Candidate • Friedman et al ‘97 • No restrictions on directionality of arcs for class attribute • Limits possible parents for each node to a small “candidate” set

Sparse Candidate Algorithm • Greedy hill climbing search with restarts • Initial structure is empty graph • Score graph using BDe metric (Cooper & Herskovits ’92, Heckerman ’96) • Selects candidate set using an information metric • Re-estimate candidate set after each restart

Sparse Candidate Algorithm • We looked at several initial structures • Expert structure • Naïve Bayes • TAN • Scored network on tune set accuracy

Our Initial Approach for Level 4 • Use ILP to learn rules predictive of “malignant” • Treat the rules as intensional definitions of new fields • The new view consists of the original table extended with the new fields

Using Views malignant(A) :- massesStability(A,increasing), prior_mammogram(A,B,_), H0_BreastCA(B,hxDCorLC).

Sample Rule malignant(A) :- BIRADS_category(A,b5), MassPAO(A,present), MassesDensity'(A,high), HO_BreastCA(A,hxDCorLC), in_same_mammogram(A,B), Calc_Pleomorphic(B,notPresent), Calc_Punctate(B,notPresent).

Methodology • 10 fold cross validation • Split at the patient level • Roughly 40 malignant cases and 6000 benign cases in each fold

Methodology • Without the ILP rules • 6 folds for training set • 3 folds for tuning set • With ILP • 4 folds to learn ILP rules • 3 folds for training set • 2 folds for tuning set • TAN/Naïve Bayes don’t require tune set

Evaluation • Precision and recall curves • Why not ROC curves? • With many negatives ROC curves look overly optimistic • Large change in number of false positives yields small change in ROC curve • Pooled results over all 10 folds

ROC: Level 2 (TAN) vs. Level 1

Precision-Recall Curves

Related Work: ILP for Feature Construction • Pompe & Kononenko, ILP’95 • Srinivasan & King, ILP’97 • Perlich & Provost, KDD’03 • Neville, Jensen, Friedland and Hay, KDD’03

Ways to Improve Performance • Learn rules to predict “benign” as well as “malignant.” • Use Gleaner (Goadrich, Oliphant & Shavlik, ILP’04) to get better spread of Precision vs. Recall in the learned rules. • Incorporate aggregation into the ILP runs themselves.

Richer View Learning Approaches • Learn rules predictive of other fields. • Use WARMR or other first-order clustering approaches. • Integrate Structure Learning and View Learning…score a rule by how much it helps the current model when added

Level 4: View Learning Given: Features, Data, Background knowledge – aggregation functions and intensionally-defined relations such as “increase” or “same location” Learn: Usefulnew features defined by views (equivalent to rules or SQL queries),Bayes net structure, and probabilities. Shape change in abnormality at this location Increase in average size of abnormalities Avg size this date Be/Mal Shape Size

Integrated View/Structure Learning sc(X):- id(X,P), id(Y,P), loc(X,L), loc(Y,L), date(Y,D1), date(X,D2), before(D1,D2), shape(X,Sh1), shape(Y,Sh2), Sh1 \= Sh2. Increase in average size of abnormalities Avg size this date Be/Mal Shape Size

Integrated View/Structure Learning sc(X):- id(X,P), id(Y,P), loc(X,L), loc(Y,L), date(Y,D1), date(X,D2), before(D1,D2), shape(X,Sh1), shape(Y,Sh2), Sh1 \= Sh2, size(X,S1), size(Y,S2), S1 > S2. Increase in average size of abnormalities Avg size this date Be/Mal Shape Size

View Learning: An extension to SRL An application in Mammography

View Learning: An extension to SRL An application in Mammography

Presentation Transcript

Persistent Memory over Fabrics An Application-centric view

An Application Engineer’s View

Attitudes and Motivation in Language Learning: An Ecological View

An Ocean View

AN INSIDER VIEW

An Example of an Android Security Extension

An Ocean View

An Expanded View from An Ethicist

Planning an extension

An alternative view

An International Dimension in Extension

E-Learning An Expanded View

Learning CRFs with Hierarchical Features: An Application to Go

An Integrated view

Introduction to Pizza : an Extension to Java

An Application of Reinforcement Learning to Autonomous Helicopter Flight

Building An Extension

E-Learning An Expanded View

TB1 – an application view

An International Dimension in Extension

An extension to the subtype relationship in C++

Building An Extension