1 / 29

Scott Doyle 1 , Michael Feldman 2 , John Tomaszewski 2 , Anant Madabhushi 1

Use of Active Learning for Selective Annotation of Training Data in a Supervised Classification System for Digitized Histology. Scott Doyle 1 , Michael Feldman 2 , John Tomaszewski 2 , Anant Madabhushi 1. 1 Department of Biomedical Engineering, Rutgers, The State University of New Jersey

esben
Download Presentation

Scott Doyle 1 , Michael Feldman 2 , John Tomaszewski 2 , Anant Madabhushi 1

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Use of Active Learning for Selective Annotation of Training Data in a Supervised Classification System for Digitized Histology Scott Doyle1, Michael Feldman2, John Tomaszewski2, Anant Madabhushi1 1Department of Biomedical Engineering, Rutgers, The State University of New Jersey 2Department of Surgical Pathology, University of Pennsylvania http://lcib.rutgers.edu

  2. Outline • Background • Digital Prostate Histopathology • Supervised Classification • Active Learning • Methodology • Active Learning • Data Description • Experimental Setup • Experimental Results • Concluding Remarks

  3. ~1 million biopsies per year in USA 10-12 tissue samples per biopsy 80% benign diagnosis Large amount of data to analyze Prostate Cancer Detection

  4. Identifies regions of interest / suspicion Quantitative Automated Reduces variability Supervised classification system Computer-Aided Diagnosis Doyle, S., Feldman, M., Tomaszewski, J., Madabhushi, A. “A Hierarchical Computer-aided Classification Scheme for Automated Detection of Prostatic Adenocarcinoma from Digitized Histology,” APIII 2006

  5. Expert segmentation for training Histopathology: Expensive, time-consuming to annotate Cost per training sample is high Supervised Classification

  6. Random training inefficient Possible redundancy with existing training No guarantee of improved accuracy Supervised Classification

  7. Solution: Active Learning • Choose training samples intelligently, not randomly • Increased accuracy per training sample • Forced choice of training, maximized accuracy • Useful where: • Large amount of unlabeled data • Annotations are expensive • Ideally suited for histopathology data

  8. Active Learning Random Learning Active Learning Classifier Performance Accuracy # of Training Samples

  9. Previous Work • Liu [2004], Vogiatzis and Tsapatsoulis [2006] • Gene microarray data • Yao, et al [2008] • Content-based image retrieval • Little work done in histopathology with Active Learning

  10. Outline • Background • Digital Prostate Histopathology • Supervised Classification • Active Learning • Methodology • Active Learning • Data Description • Experimental Setup • Experimental Results • Concluding Remarks

  11. Active Learning Methodology Classify Unlabeled Training Labeled Build Classifier Training Data Unlabeled Obtained from pathologist Build Classifier Uncertain Classification Cancer Non-cancer

  12. Active Learning Methodology + Uncertain Classification Informative Samples Eliminate, labeling these adds no information Obtain Expert Labels Identify Informative Regions Combine With Original Set Certain Classification Uninformative

  13. Active Learning Methodology New Training Set Generate New Classifier

  14. Feature Extraction Feature Images Original Image Cancer Region

  15. Classification Feature Images C4.5 Decision Tree “Random Forest” [Brieman, 2001] Majority voting determines classification Doyle, S., Madabhushi, A., Feldman, M., Tomaszeweski, J.: A Boosting Cascade for Automated Detection of Prostate Cancer from Digitized Histology, MICCAI, Lecture Notes in Computer Science, Vol. 4191, pp. 504-511, 2006.

  16. Image Data Description • 27 H&E stained digital biopsy samples • Data breakdown: • Initial Training Set • Unlabeled Training Set • Testing Set • Active Learning drawn from Unlabeled Training • Groups rotated so all images are tested

  17. Classification • Three training groups evaluated: • Initial set: • Active Learning set: • Random Learning set: Initial Training Initial Training Active Learning + Initial Training Random Learning +

  18. Outline • Background • Digital Prostate Histopathology • Supervised Classification • Active Learning • Methodology • Active Learning • Data Description • Experimental Setup • Experimental Results • Concluding Remarks

  19. Results: Qualitative Random Learning Original Image Active Learning

  20. Results: Qualitative Random Learning Active Learning

  21. Results: Qualitative Original Image Random Learning Active Learning

  22. Results: Qualitative Random Learning Active Learning

  23. Quantitative Evaluation

  24. Quantitative Evaluation

  25. Quantitative Evaluation

  26. Outline • Background • Digital Prostate Histopathology • Supervised Classification • Active Learning • Methodology • Active Learning • Data Description • Experimental Setup • Experimental Results • Concluding Remarks

  27. Concluding Remarks • Maximize classification accuracy by choosing training intelligently • Efficiently obtain annotations • Make the most use of “training budget” • Build Active Learning into clinical applications • Online training correction / modification • User feedback

  28. Acknowledgements • The Coulter foundation (WHCF 4-29368) • New Jersey Commission on Cancer Research • The National Cancer Institute (R21CA127186-01, R03CA128081-01) • The US Department of Defense (427327) • The Society for Medical Imaging and Informatics

More Related