120 likes | 231 Views
PRESENTATION - people. ISG (Intelligent Systems Group) Researching Group http://www.si.ehu.es/isg Donostia- San Sebastián Computer Science Faculty - University of the Basque Country Group leader: Pedro Larrañaga Ph.D.: Jose Lozano, Endika Bengoetxea, Iñaki Inza
E N D
PRESENTATION - people ISG (Intelligent Systems Group) Researching Group http://www.si.ehu.es/isg Donostia- San Sebastián Computer Science Faculty - University of the Basque Country • Group leader: Pedro Larrañaga • Ph.D.: Jose Lozano, Endika Bengoetxea, Iñaki Inza • Ph.D. Students: Rosa Blanco, Jose L. Flores, Cristina González, Aritz Pérez, Ramón Sagarna, Guzmán Santafé • Collaborator: Jose M. Peña (Ph.D., Aalborg University), Rubén Armañanzas
RESEARCH TOPICS • Machine Learning – Data mining: • Learning of Bayesian networks (learning the joint probability) • Bayesian networks for (supervised – unsupervised) classification • Preprocess tasks: feature subset selection problem, discretization, imputation of missing values... • Optimization: • Genetic Algorithms • Estimation of Distribution Algorithms (EDAs) Bayesian networks for optimization in NP-hard problems • Applications: • Medical applications (brain images, cirrhotic patients,breast cancer, skin melanoma, etc.) • Bioinformatics: classification in DNA microarrays • Software testing
SEVERAL RESEARCH PROJECTS • Data mining in bioinformatics • Software testing • ELVIRA project: • Open source code for building-managing Bayesian networks (building, inference, propagation, abduction, classification, explanation...) • Written in Java • Concurrently programmed by 5 spanish universities http://leo.ugr.es/~elvira/
DATA MINING IN BIOINFORMATICSDNA microarraysGenome Human Project (U.C. Santa Cruz) http://genome.ucsc.edu
A DNA microarray sample • One of the developments within Genome Project • From the tissue to the scanned image • Tissue microarray chip DNA mRNA hybridization on a microarray fluorescent image scanning reflecting the expression level of thousands of genes at a time
A DNA MICROARRAY COLLECTION Rows genes; Columns cases, samples, biopsyes, tissues, ‘cell-lines’...
PROBLEM GOAL-TASK • The usual for biologists: • Hierarchical clustering of genes • Hierarchical clustering of tissues • Focusing on the specific nature of each tissue: • Building of a supervised model which accurately predicts the specific nature - characteristic of future and doubtful tissues: • cancer vs. normal • benignant vs. malignant tumor • specific type of cancer,...
Our work: selection of relevant genes in DNA microarray SUPERVISED tasks • Small area within bioinformatics. • Huge dimensionality (> 1,000) can not learn the model at first glance selection of genes, crucial task • Application goals: • Development of drugs to act over the relevant genes • Therapy development • Diagnostic purposes • Supervised tasks (i.e., benignant – malignant tumor) • Literature: Golub et al.’99, Brazma’00, Friedman’00, Xing & Jordan’01... • For a specific disease 10-15 genes seem relevant
OUR APPROACH TO GENE SELECTION • Search algorithms: sequential (forward), EDAs... • Wrapper - Filter evaluation functions • Classification algorithms: naive-Bayes and Bayesian networks, K-NN, IF-THEN rules... • Made-own software and freeware software (ELVIRA,WEKA, MLC++...) • Our ‘Talón de Aquiles’ (weak point): • Biological interpretation of induced models and selected genes, validity of obtained recognition accuracy...
PUBLICATIONS IN BIOINFORMATICS • R. Blanco, P. Larrañaga, I. Inza, B. Sierra (2004). “Gene selection for cancer classification using wrapper approaches”. International Journal of Pattern Recognition and Artificial Intelligence • I. Inza, P. Larrañaga, R. Blanco, A. J. Cerrolaza (2003). “Filter versus wrapper gene selection approaches in DNA microarray domains”. Artificial Intelligence in Medicine Journal. Special issue in “Data mining in Genomics and Proteomics” • I. Inza, B. Sierra, R. Blanco, P. Larrañaga (2002). “Gene selection by sequential search wrapper approaches in microarray cancer class prediction”. Journal of Intelligent and Fuzzy Systems. Special issue in Bioinformatics
INTERESTING REFERENCES • Conferences: • ISMB: International Symposium on Molecular Biology • ECCB: European Conference on Computational Biology • CAMDA: Critical Assesment of Microarray Data Analysis • WABI: Workshop on Algorithms in Bioinformatics • Reference journal: “Bioinformatics” and special issues of machine learning journals on the topic • Web sites: • Stanford Genomic Resources Stanford Microarray Database • http://www.gene-chips.com/ • Hebrew University (N. Friedman, D. Pe’er, I. Nachman...) • Tel Aviv University (R. Shamir) • Human Genome Working Draft: http://genome.ucsc.edu ...............................................