1 / 1

DATA WAREHOUSE FOR BIO-GEO HEALTH CARE INFORMATICS

g1. Patient Information. g2. Clinical and Pathology. g3. Geographical Information. Gene Expression. g4. g5. g6. Public Data (Literature etc). Clusters. Input Profiles. DATA WAREHOUSE FOR BIO-GEO HEALTH CARE INFORMATICS.

odeda
Download Presentation

DATA WAREHOUSE FOR BIO-GEO HEALTH CARE INFORMATICS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. g1 Patient Information g2 Clinical and Pathology g3 Geographical Information Gene Expression g4 g5 g6 Public Data (Literature etc) Clusters Input Profiles DATA WAREHOUSE FOR BIO-GEO HEALTH CARE INFORMATICS Ranjit Ganta, Raj Acharya, Shruthi PrabhakaraDepartment of Computer Science and Engineering, Penn State University INTRODUCTION • Integration of Health-care records: • Privacy Violation • Distributed integration of health care records. • Integration within Health-care records: • Information Fusion: Combine multiple disparate sources of information such that the whole is more than the sum of it’s parts. Figure: Information Fusion Based Attack Prototype for Bio-geo Data Warehouse • Health-Care records: ‘Bio-geo’ Informatics • Patient identification information • Geographical Information. • Clinical Information: • Organ/Cellular level: Tumor, pathology. • Molecular level: DNA sequence, Microarray. • Laboratory data: Blood tests, diagnosis, prognosis. Cancer Research Grid Cancer Analysis Applications Global Statistics Information Fusion based Clustering Result Visualization CORRESPONDENCE ANALYSIS Sample Result: Sample Result: Example Data : Dhanasekharan et al. "Delineation of prognostic biomarkers in prostate cancer", Letters to Nature, Vol 412, August 2001, pages 822-826. Supplementary data (Fig 1C, pg 823,Commercial Pool) Gene expression (microarray data) in four clinical states of prostate-derived tissues • For the patient demographic data set this helps answer questions such as: • Which age/race profile(s) if any, define a typical profile of a prostate cancer patient? • Are middle-aged Caucasian males more prone to prostate cancer than Caucasians of other age groups? • Is there a close association between age and race groups? KL-CLUSTERING Genes To Co-regulated genes Common Motifs The Kullback-Leibler (KL) divergence measures the relative dissimilarity of the shapes of two gene profiles. 1-D SOM algorithm + KL Minimize D(Gene || SOM weight for each node) at each iteration step. Frequency of occurrence Down-regulated {g1} Up-regulated {g2, g4}; {g3}; {g5} • Motif: short segments of DNA that act as a binding site for a specific transcription factor • Typically 6-25bp in length • Statistically different in composite compared to the background • Often repeated within a sequence No change {g6} [Bioinformatics, Vol. 19, No. 4, 2003, 449-458] COMBINED CLUSTERING Combined clustering Alpha Factor Experiments • Clustering using more than one data source • aims at identifying clusters of genes with • similarproperties among all data. • Goal of combined clustering is to answer the following: • If genes have similar expression profile patterns, do they also share common motifs? • If genes have a set of motifs in common, do they also exhibit similar expression profile patterns? • Which genes share BOTH - that is, they have similar expression profile patterns AND share a set of common motifs? All genes in the cluster share the Transcription Factor MCBa Cluster on Motif vectors Cluster on Gene expression CONCLUSION • We have demonstrated the significance of information fusion based tools for bio-geo health care informatics. • As a data warehouse for various data sets involved in bio-geo health care informatics studies. • To provide and demonstrate a set of information fusion tools for disease research.

More Related