140 likes | 351 Views
Università degli Studi di Ferrara ENDIF – Dipartimento di Ingegneria. Tecniche di Intelligenza Artificiale in Bioinformatica. Giacomo Gamberoni. Data Mining in Bioinformatics. Genetic data from comparative experiments (normal-cancer)
E N D
Università degli Studi di FerraraENDIF – Dipartimento di Ingegneria Tecniche di Intelligenza Artificiale in Bioinformatica Giacomo Gamberoni
Data Mining in Bioinformatics • Genetic data from comparative experiments (normal-cancer) • Data provided by Dipartimento di morfologia ed embriologia – Università di Ferrara (Dott. Stefano Volinia) • Software used: • Weka • Matlab • MySQL
Slide is prepared fixing base sequences (ESTs) in specific points (spots) on the glass Hybridization of two mRNA samples from two cell populations coloured with different fluorescent dyes Microarray Experiments Scanning the slide, we measure fluorescence intensities of the two channels in each spot
Dataset normalization • Keep only spots with good intensity in at least 75% of the samples • Log ratio: • Subtract the median of ratios in each spot • Divided by SD of each spot • Keep only spots with at least one sample significantly expressed (Log Ratio >1.5)
Datasets analyzed • Hepatocellular Carcinoma • Reference: artificial mRNA pool • 7449 ESTs for 161 samples • 95 Cancer • 82 HBV+, 3 HCV+, 10 no Hepatitis antibodies • 66 Normal • 47 HBV+, 5 HCV+, 14 no Hepatitis antibodies • Larynx squamous cell carcinoma • Reference: normal larynx • 7626 ESTs for 22 samples • 11 lynph node negative (N0) • 11 lynph node positive (N+)
Supervised/unsupervised learning • Supervised learning • Decision tree • Support vector machines • Unsupervised learning • Hierarchical clustering
Results 358885 <= 0.719385542 | 740476 <= 0.856739394 | | 626619 <= 0.552788235 | | | 451711 <= -0.84774 | | | | 786690 <= -0.116917241: HBV+ (5.0) | | | | 786690 > -0.116917241: HBV- (4.0) | | | 451711 > -0.84774: HBV+ (107.0/1.0) | | 626619 > 0.552788235 | | | 310406 <= -0.162467: HBV- (6.0) | | | 310406 > -0.162467: HBV+ (12.0/1.0) | 740476 > 0.856739394 | | 344648 <= 0.051885057: HBV- (10.0) | | 344648 > 0.051885057: HBV+ (7.0/1.0) 358885 > 0.719385542: HBV- (10.0/1.0) • Decision tree • Clustering dendrogram
Gene correlation • Analysis of correlation between expression of different genes • Study of the expression of every possible couple of genes • Computational complexity • Integration with extra knowledge • Genetic annotation (Gene Ontology) • Chromosome location
Intra-gene relations • Studying intra-gene relations we can obtain useful results for: • Quality control • Different ESTs from the same UGC should be equally expressed • A bad correlation between these ESTs may be due to experimental error • Chromosomal aberration • We can highlight parts of genes that lose correlation • Purpose • Studying intra-gene relations we can obtain useful results for: • Quality control • Different ESTs from the same UGC should be equally expressed • A bad correlation between these ESTs may be due to experimental error • Chromosomal aberration • We can highlight parts of genes that lose correlation
Relations in Processes • Study relations between the genes involved in the same biological processes • Biological processes as defined by the Gene Ontology • Highlight differences in gene correlations between normal and cancer • Purpose • Studying intra-gene relations we can obtain useful results for: • Quality control • Different ESTs from the same UGC should be equally expressed • A bad correlation between these ESTs may be due to experimental error • Chromosomal aberration • We can highlight parts of genes that lose correlation
Present Activities • Development of a web-based interface to make several algorithms available for biologists (PHP, JAVA) • Implementation of some algorithms as plug-ins of an open source analysis suite (JAVA) • Extension of our algorithms in order to analyze other data sources: • SAGE data • Affymetrix data
Publications • Giacomo Gamberoni, Evelina Lamma, Sergio Storari, Diego Arcelli, Francesca Francioso and Stefano Volinia. Exploiting supervised and unsupervised learning techniques for profiling cancer data. Presented at Workshop: Data Mining in Functional Genomics and Proteomics in ECAI 2004. • Giacomo Gamberoni e Sergio Storari. Supervised and unsupervised learning techniques for profiling SAGE results. Presented at Discovery Challenge in ECML/PKDD 2004.
Publications • Giacomo Gamberoni, Evelina Lamma, Sergio Storari, Diego Arcelli, Francesca Francioso and Stefano Volinia. Correlation of expression between different IMAGE clones from the same UniGene Cluster. Presented in ISBMDA 2004; published in Biological and Medical Data Analysis, Lecture Notes in Computer Science 3337, Springer.