20 likes | 156 Views
g1. Patient Information. g2. Clinical and Pathology. g3. Geographical Information. Gene Expression. g4. g5. g6. Public Data (Literature etc). Clusters. Input Profiles. DATA WAREHOUSE FOR BIO-GEO HEALTH CARE INFORMATICS.
E N D
g1 Patient Information g2 Clinical and Pathology g3 Geographical Information Gene Expression g4 g5 g6 Public Data (Literature etc) Clusters Input Profiles DATA WAREHOUSE FOR BIO-GEO HEALTH CARE INFORMATICS Ranjit Ganta, Raj Acharya, Shruthi PrabhakaraDepartment of Computer Science and Engineering, Penn State University INTRODUCTION • Integration of Health-care records: • Privacy Violation • Distributed integration of health care records. • Integration within Health-care records: • Information Fusion: Combine multiple disparate sources of information such that the whole is more than the sum of it’s parts. Figure: Information Fusion Based Attack Prototype for Bio-geo Data Warehouse • Health-Care records: ‘Bio-geo’ Informatics • Patient identification information • Geographical Information. • Clinical Information: • Organ/Cellular level: Tumor, pathology. • Molecular level: DNA sequence, Microarray. • Laboratory data: Blood tests, diagnosis, prognosis. Cancer Research Grid Cancer Analysis Applications Global Statistics Information Fusion based Clustering Result Visualization CORRESPONDENCE ANALYSIS Sample Result: Sample Result: Example Data : Dhanasekharan et al. "Delineation of prognostic biomarkers in prostate cancer", Letters to Nature, Vol 412, August 2001, pages 822-826. Supplementary data (Fig 1C, pg 823,Commercial Pool) Gene expression (microarray data) in four clinical states of prostate-derived tissues • For the patient demographic data set this helps answer questions such as: • Which age/race profile(s) if any, define a typical profile of a prostate cancer patient? • Are middle-aged Caucasian males more prone to prostate cancer than Caucasians of other age groups? • Is there a close association between age and race groups? KL-CLUSTERING Genes To Co-regulated genes Common Motifs The Kullback-Leibler (KL) divergence measures the relative dissimilarity of the shapes of two gene profiles. 1-D SOM algorithm + KL Minimize D(Gene || SOM weight for each node) at each iteration step. Frequency of occurrence Down-regulated {g1} Up-regulated {g2, g4}; {g3}; {g5} • Motif: short segments of DNA that act as a binding site for a specific transcription factor • Typically 6-25bp in length • Statistically different in composite compared to the background • Often repeated within a sequence No change {g6} [Bioinformatics, Vol. 19, No. 4, 2003, 449-458] COMBINED CLUSTERING Combined clustering Alpha Factor Experiments • Clustering using more than one data source • aims at identifying clusters of genes with • similarproperties among all data. • Goal of combined clustering is to answer the following: • If genes have similar expression profile patterns, do they also share common motifs? • If genes have a set of motifs in common, do they also exhibit similar expression profile patterns? • Which genes share BOTH - that is, they have similar expression profile patterns AND share a set of common motifs? All genes in the cluster share the Transcription Factor MCBa Cluster on Motif vectors Cluster on Gene expression CONCLUSION • We have demonstrated the significance of information fusion based tools for bio-geo health care informatics. • As a data warehouse for various data sets involved in bio-geo health care informatics studies. • To provide and demonstrate a set of information fusion tools for disease research.