1 / 55

Domain agnostic tools for multi-scale/integrative sensor data analysis

Domain agnostic tools for multi-scale/integrative sensor data analysis. Joel Saltz MD, PhD Stony Brook University. Radiology Imaging. Integrative Biomedical Informatics Analysis. Patient Outcome.

trilby
Download Presentation

Domain agnostic tools for multi-scale/integrative sensor data analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Domain agnostic tools for multi-scale/integrative sensor data analysis Joel Saltz MD, PhD Stony Brook University

  2. Radiology Imaging Integrative Biomedical Informatics Analysis Patient Outcome • Reproducible anatomic/functional characterization at fine level (Pathology) and gross level (Radiology) • High throughput multi-scale image segmentation, feature extraction, analysis of features • Integration of anatomic/functional characterization with multiple types of “omic” information “Omic” Data Pathologic Features

  3. Overview • Pathology Computer Aided Diagnosis • Integrative analysis of tissue: pathology, radiology, ‘omics’ and outcome • Management, query, analysis of integrative data • High end Computing tools for multi-scale analysis • Electronic health data: analytics, tools for Clinical phenotype characterization, population health

  4. Pathology Computer Assisted Diagnosis Gurcan, Shamada, Kong, Saltz

  5. Ganglioneuroma (Schwannian stroma-dominant) Maturing subtype Mature subtype absent FH Microscopic Neuroblastic foci absent Ganglioneuroblastoma, Intermixed (Schwannian stroma-rich) FH Grossly visible Nodule(s) present ≥50% Ganglioneuroblastoma, Nodular (composite, Schwannian stroma-rich/ stroma-dominant and stroma-poor) present UH/FH* Variant forms* Schwannian Development Undifferentiated subtype Any age UH Mitotic & karyorrhectic cells ≥200/5,000 cells Any age UH None to <50% Poorly differentiated subtype 100-200/5,000 cells ≥1.5 yr UH <100/5,000 cells <1.5 yr FH Neuroblastoma (Schwannian stroma-poor) ≥200/5,000 cells Any age UH ≥1.5 yr UH Differentiating subtype 100-200/5,000 cells <1.5 yr FH ≥5 yr UH <100/5,000 cells FH Neuroblastoma Classification <5 yr FH: favorable histology UH: unfavorable histology CANCER 2003; 98:2274-81

  6. Background? Yes Image Tile Label Initialization I = L No Create Image I(L) Training Tiles Segmentation I = I -1 Down-sampling Feature Construction Segmentation Yes No I > 1? Feature Extraction Feature Construction Feature Extraction Classification Classifier Training Within Confidence Region ? No Yes TRAINING TESTING Computerized Classification System for Grading Neuroblastoma • Background Identification • Image Decomposition (Multi-resolution levels) • Image Segmentation (EMLDA) • Feature Construction (2nd order statistics, Tonal Features) • Feature Extraction (LDA) + Classification (Bayesian) • Multi-resolution Layer Controller (Confidence Region)

  7. INTEGRATIVE ANALYSIS OF TISSUE: pathology, radiology, ‘omics’ and outcome

  8. Quantitative Feature Analysis in Pathology: Emory In Silico Center for Brain Tumor Research (PI = Dan Brat, PD= Joel Saltz)

  9. Using TCGA Data to Study Glioblastoma Diagnostic Improvement Molecular Classification Predictors of Progression

  10. TCGA Network Digital Pathology Neuroimaging

  11. Morphological Tissue Classification Whole Slide Imaging Cellular Features Nuclei Segmentation Lee Cooper, Jun Kong

  12. Millions of Nuclei Defined by n Features Top-down analysis: use the features with existing diagnostic constructs

  13. TCGA Whole Slide Images Step 1: Nuclei Segmentation • Identify individual nuclei and their boundaries Jun Kong

  14. Nuclear Analysis Workflow Step 1: Nuclei Segmentation Step 2: Feature Extraction • Describe individual nuclei in terms of size, shape, and texture

  15. Step 3: Nuclei Classification Nuclear Qualities 1 10 Astrocytoma Oligodendroglioma

  16. Survival Analysis Human Machine

  17. Gene Expression Correlates of High Oligo-Astro Ratio on Machine-based Classification Oligo Related Genes Myelin Basic Protein Proteolipoprotein HoxD1 Nuclear features most Associated with Oligo Signature Genes: Circularity (high) Eccentricity (low)

  18. Millions of Nuclei Defined by n Features Bottom-up analysis: let nuclear features define and drive the analysis

  19. Direct Study of Relationship Between Image FeaturesvsClinical Outcome, Response to Treatment, Molecular Information Lee Cooper, Carlos Moreno

  20. Nuclear Features Used to Classify GBMs Consensus clustering of morphological signatures Study includes 200 million nuclei taken from 480 slides corresponding to 167 distinct patients Each possibility evaluated using 2000 iterations of K-means to quantify co-clustering

  21. Clustering identifies three morphological groups • Analyzed 200 million nuclei from 162 TCGA GBMs (462 slides) • Named for functions of associated genes: Cell Cycle (CC), Chromatin Modification (CM), Protein Biosynthesis (PB) • Prognostically-significant (logrank p=4.5e-4)

  22. Associations

  23. Molecular and Pathology Correlates of MR Features Using TCGA Data MRIs of TCGA GBMs reviewed by 3-6 neuroradiologists using VASARI feature set and In Vivo Imaging tools MR Features compared to TCGA Transcriptional Classes, Genetic Alterations and Pathology NCI/in silico group led by Adam Flanders

  24. VASARI Feature Set

  25. Emory CTD2 Center: High throughput protein-protein interaction interrogation in cancer • Emory Molecular Interaction Center for Functional Genomics (MicFG) Principal Investigator and Director: Haian Fu Co-Directors: Fadlo R. Khuri, Joel Saltz Project Manager: Margaret Johns Aim 2 Leader Carlos Moreno Aim 1 Leader Yuhong Du Genomics informatics and data integration Cancer genomics-based HT PPI network discovery & validation Winship Cancer Institute Center for Comprehensive Informatics Emory Chemical Biology Discovery Center

  26. Rich morphological and molecular characterizations of macroscopic tissue samples at microscopic resolution multi-Scale Imaging: Integrated structure and molecular characterization

  27. Quantum Dot Immunohistochemistry, LCM + NGS, Imaging Mass Spec Genomics Excellent Molecular Resolution Limited Spatial Resolution Imaging Excellent Spatial Resolution Limited Molecular Resolution 1000’s of genes

  28. Integrative Multi-scale Biomedical Informatics • Quantitative analyses of the interplay between morphology and spatially mapped genetics and molecular data to be used in studies that predict outcome and response to treatment • Assemble, visualize and quantify detailed, multi-scale descriptions of tissue morphologic changes originating from a wide range of microscopy instruments • Create/adapt computational and pattern recognition tools to integrate these descriptions with corresponding genomic, proteomic, glycomic, and clinical signatures.

  29. Driving Biomedical Problems • Human: Lung Cancer Heterogeneity and Targeted Therapy (Khuri, Marcus) • Human: Gastrointestinal Cancer Risk Stratification and Prevention (Bostick, Baron) • Human and Mouse model: Glioma Microenvironment and Systems Biology (Brat, Mikkelsen) • Mouse model: Role of PTEN in the orchestrated sequence of events, leading to tumor initiation (Leone) • Mouse model: Role of Tn, STn tumor antigens in cancer initiation and progression, the impact of tissue-type specific alternations in Cosmc and the impact of altered expression of T-synthase (Cummings)

  30. Tumor heterogeneity • Multiple definitions: • Genetic, epigenetic heterogeneity within tumor • Differences in microenvironments within tumor • Phenome differences within tumor • Heterogeneity involving primary and metastases • Characterization: • Imaging phenotype (radiology, pathology, optical…) • Molecular phenotype • Spatially characterized molecular phenotype (Laser captured microdissection, imaging mass spec, molecular imaging) • … Correlating Imaging Phenotypes with Genomic Signatures: Scientific Opportunities

  31. Clinical Approach and Use • Development of imaging+analysis methods to characterize heterogeneity • within a tumor at one time point • evolution over time • among different tumor types • Development of imaging metrics that: • can predict and detect emergence of resistance? • correlates with genomic heterogeneity? • correlates with habitat heterogeneity? • can identify more homogeneous sub-types Correlating Imaging Phenotypes with Genomic Signatures: Scientific Opportunities

  32. Radiology Imaging Patient Outcome “Omic” Data Pathologic Features Management, query, analysis of INTEGRATIVE DATA

  33. Large Scale Spatial Query, Analysis and Data Management • Highly optimized spatial query and analyses • Hadoop/HDFS, IBM DB2, optimized CPU/GPU spatial algorithms • Represented by a complex data model capturing multi-faceted information including markups, annotations, algorithm provenance, specimen, etc. • Support for complex relationships and spatial query: multi-level granularities, relationships between markups and annotations, spatial and nested relationships • Supported by two NLM R01 grants – Saltz/Foran PAIS Database • Implemented with IBM DB2 for large scale pathology image metadata (~million markups per slide) • Represented by a complex data model capturing multi-faceted information including markups, annotations, algorithm provenance, specimen, etc. • Support for complex relationships and spatial query: multi-level granularities, relationships between markups and annotations, spatial and nested relationships • Support for high-level data statistical analysis

  34. Spatial Centric – Pathology Imaging “GIS” Point query: human marked point inside a nucleus Window query: return markups contained in a rectangle . Containment query: nuclear feature aggregation in tumor regions Spatial join query: algorithm validation/comparison Fusheng Wang

  35. VLDB 2012, 2013 Spatial Query, Change Detection, Comparison, and Quantification

  36. Partnership with Oak Ridge National Laboratory (collaborators -- Scott Klasky, Jeff Vetter ) Also, aka Big Data High end Computing tools for multi-scale analysis

  37. Macroscopic 3-D Tissue at Micron Resolution: OSU BISTI NBIB Center Big Data (2005) Associate genotype with phenotype Big science experiments on cancer, heart disease, pathogen host response Tissue specimen -- 1cm3 0.3 μ resolution – roughly 1013bytes Molecular data (spatial location) can add additional significant factor; e.g. 102 Multispectral imaging, laser captured microdissection, Imaging Mass Spec, Multiplex QD Multiple tissue specimens; another factor of103 Total: 1018bytes – exabyteper big science experiment

  38. Integrate Information from Sensors, Images, Cameras • Multi-dimensional spatial-temporal datasets • Radiology and Microscopy Image Analyses • Oil Reservoir Simulation/Carbon Sequestration/Groundwater Pollution Remediation • Biomass monitoring and disaster surveillance using multiple types of satellite imagery • Weather prediction using satellite and ground sensor data • Analysis of Results from Large Scale Simulations • Square Kilometer Array • Google Self Driving Car • Correlative and cooperative analysis of data from multiple sensor modalities and sources • Equivalent from standpoint of data access patterns – we propose a integrative sensor data mini-App

  39. Core Transformations • Data Cleaning and Low Level Transformations • Data Subsetting, Filtering, Subsampling • Spatio-temporal Mapping and Registration • Object Segmentation • Feature Extraction • Object/Region/Feature Classification • Spatio-temporal Aggregation • Change Detection, Comparison, and Quantification

  40. Runtime Support Objectives - (Similar to what is required for most applications discussed today!) • Coordinated mapping of data and computation to complex memory hierarchies • Hierarchical work assignment with flexibility capable of dealing with data dependent computational patterns, fluctuations in computational speed associated with power management, faults • Linked to comprehensible programming model – model targeted at abstract application class but not to application domain (In the sensor, image, camera case -- Region Templates) • Software stack including coordinated compiler/runtime support/autotuningframeworks

  41. HPC Segmentation and Feature Extraction Pipeline Tony Pan, George Teodoro, TahsinKurc and Scott Klasky

  42. Andrew Post, SharathCholleti, Doris Gao, Joel Saltz, Bill Bornstein Emory David Levine, Sam Hohmann, UHC Electronic health data: analytics, tools for Clinical phenotype characterization, population health

  43. Clinical Phenotype Characterization and the Emory Analytic Information Warehouse • Find hot spots in readmissions within 30 days • Fraction of patients with a given principal diagnosis will be readmitted within 30 days? • Fraction of patients with a given set of diseases will be readmitted within 30 days? • How does severity and time course of co-morbidities affect readmissions? • Geographic analyses • Compare and contrast with UHC Clinical Data Base • Repeat analyses across all 180+ UHC hospitals • Hospital to hospital differences • Ability to predict readmissions across hospitals • Need a repeatable process that we can apply identically to both local and UHC data

  44. Analytic Information Warehouse 5-year Datasets from Emory and University Healthcare Consortium • EUH, EUHM and WW (inpatient encounters) • Removed encounter pairs with chemotherapy and radiation therapy readmit encounters (CDW data) • Encounter location (down to unit for Emory) • Providers (Emory only) • Discharge disposition • Primary and secondary ICD9 codes • Procedure codes • DRGs • Medication orders (Emory only) • Labs (Emory only) • Vitals (Emory only) • Geographic information (CDW only + US Census and American Community Survey)

  45. Geographic AnalysesUHC Medicine General Product Line (#15) Analytic Information Warehouse

  46. Analytic Information Warehouse Predictive Modeling for Readmission • Random forests (ensemble of decision trees) • Create a decision tree using a random subset of the variables in the dataset • Generate a large number of such trees • All trees vote to classify each test example in a training dataset • Generate a patient-specific readmission risk for each encounter • Rank the encounters by risk for a subsequent 30-day readmission

  47. Emory Readmission Rates for High and Low Risk Groups Generated with Random Forest

  48. Predictive Modeling Applied to 180 UHC HospitalsReadmission fraction of top 10% high risk patients

More Related