1 / 12

PRESENTATION - people

PRESENTATION - people. ISG (Intelligent Systems Group) Researching Group http://www.si.ehu.es/isg Donostia- San Sebastián Computer Science Faculty - University of the Basque Country Group leader: Pedro Larrañaga Ph.D.: Jose Lozano, Endika Bengoetxea, Iñaki Inza

bao
Download Presentation

PRESENTATION - people

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. PRESENTATION - people ISG (Intelligent Systems Group) Researching Group http://www.si.ehu.es/isg Donostia- San Sebastián Computer Science Faculty - University of the Basque Country • Group leader: Pedro Larrañaga • Ph.D.: Jose Lozano, Endika Bengoetxea, Iñaki Inza • Ph.D. Students: Rosa Blanco, Jose L. Flores, Cristina González, Aritz Pérez, Ramón Sagarna, Guzmán Santafé • Collaborator: Jose M. Peña (Ph.D., Aalborg University), Rubén Armañanzas

  2. RESEARCH TOPICS • Machine Learning – Data mining: • Learning of Bayesian networks (learning the joint probability) • Bayesian networks for (supervised – unsupervised) classification • Preprocess tasks: feature subset selection problem, discretization, imputation of missing values... • Optimization: • Genetic Algorithms • Estimation of Distribution Algorithms (EDAs)  Bayesian networks for optimization in NP-hard problems • Applications: • Medical applications (brain images, cirrhotic patients,breast cancer, skin melanoma, etc.) • Bioinformatics: classification in DNA microarrays • Software testing

  3. SEVERAL RESEARCH PROJECTS • Data mining in bioinformatics • Software testing • ELVIRA project: • Open source code for building-managing Bayesian networks (building, inference, propagation, abduction, classification, explanation...) • Written in Java • Concurrently programmed by 5 spanish universities http://leo.ugr.es/~elvira/

  4. DATA MINING IN BIOINFORMATICSDNA microarraysGenome Human Project (U.C. Santa Cruz) http://genome.ucsc.edu

  5. A DNA microarray sample • One of the developments within Genome Project • From the tissue  to the scanned image • Tissue  microarray chip  DNA  mRNA  hybridization on a microarray  fluorescent image  scanning  reflecting the expression level of thousands of genes at a time

  6. A DNA MICROARRAY COLLECTION Rows  genes; Columns  cases, samples, biopsyes, tissues, ‘cell-lines’...

  7. SEVERAL MICROARRAY DATASETS

  8. PROBLEM GOAL-TASK • The usual for biologists: • Hierarchical clustering of genes • Hierarchical clustering of tissues • Focusing on the specific nature of each tissue: • Building of a supervised model which accurately predicts the specific nature - characteristic of future and doubtful tissues: • cancer vs. normal • benignant vs. malignant tumor • specific type of cancer,...

  9. Our work: selection of relevant genes in DNA microarray SUPERVISED tasks • Small area within bioinformatics. • Huge dimensionality (> 1,000)  can not learn the model at first glance  selection of genes, crucial task • Application goals: • Development of drugs to act over the relevant genes • Therapy development • Diagnostic purposes • Supervised tasks (i.e., benignant – malignant tumor) • Literature: Golub et al.’99, Brazma’00, Friedman’00, Xing & Jordan’01... • For a specific disease  10-15 genes seem relevant

  10. OUR APPROACH TO GENE SELECTION • Search algorithms: sequential (forward), EDAs... • Wrapper - Filter evaluation functions • Classification algorithms: naive-Bayes and Bayesian networks, K-NN, IF-THEN rules... • Made-own software and freeware software (ELVIRA,WEKA, MLC++...) • Our ‘Talón de Aquiles’ (weak point): • Biological interpretation of induced models and selected genes, validity of obtained recognition accuracy...

  11. PUBLICATIONS IN BIOINFORMATICS • R. Blanco, P. Larrañaga, I. Inza, B. Sierra (2004). “Gene selection for cancer classification using wrapper approaches”. International Journal of Pattern Recognition and Artificial Intelligence • I. Inza, P. Larrañaga, R. Blanco, A. J. Cerrolaza (2003). “Filter versus wrapper gene selection approaches in DNA microarray domains”. Artificial Intelligence in Medicine Journal. Special issue in “Data mining in Genomics and Proteomics” • I. Inza, B. Sierra, R. Blanco, P. Larrañaga (2002). “Gene selection by sequential search wrapper approaches in microarray cancer class prediction”. Journal of Intelligent and Fuzzy Systems. Special issue in Bioinformatics

  12. INTERESTING REFERENCES • Conferences: • ISMB: International Symposium on Molecular Biology • ECCB: European Conference on Computational Biology • CAMDA: Critical Assesment of Microarray Data Analysis • WABI: Workshop on Algorithms in Bioinformatics • Reference journal: “Bioinformatics” and special issues of machine learning journals on the topic • Web sites: • Stanford Genomic Resources  Stanford Microarray Database • http://www.gene-chips.com/ • Hebrew University (N. Friedman, D. Pe’er, I. Nachman...) • Tel Aviv University (R. Shamir) • Human Genome Working Draft: http://genome.ucsc.edu ...............................................

More Related