460 likes | 785 Views
CSCI5461: Functional Genomics, Systems Biology and Bioinformatics (Spring 2012). Course Overview. Chad Myers Department of Computer Science and Engineering University of Minnesota cmyers@cs.umn.edu. Welcome to CSci 5461. Course: Functional Genomics, Systems Biology and Bioinformatics
E N D
CSCI5461: Functional Genomics, Systems Biology and Bioinformatics (Spring 2012) Course Overview Chad Myers Department of Computer Science and Engineering University of Minnesota cmyers@cs.umn.edu
Welcome to CSci 5461 • Course:Functional Genomics, Systems Biology and Bioinformatics • Instructor: Chad Myers, Assistant Professor • Teaching Assistant: Benjamin VanderSluis (bvander@cs.umn.edu) • Chad Myers • Contact: • Office: 5-217 EE/CS Building Phone: (612) 624-8306Email:cmyers@cs.umn.edu Web: http://www.cs.umn.edu/~cmyers/ • Research: Computational and Systems Biology • Education: Ph.D. 2007, Computer Science, Princeton University
Lectures and office hours • Lecture: Keller Hall, 3-125; 11:15AM-12:30PM, T/Th • Office hours (also by appointment): • Chad: 4:00-5:30pm Tuesday (Keller Hall, 5-217) • Ben: 10-11:30am Wed. (Keller Hall, 5-216)
Text Books Advanced analysis of gene expression microarray data, Aidong Zhang, World Scientific 2006, ISBN 981-256-645-7 Analysis of Biological Networks, Björn H. Junker (Author), Falk Schreiber, (Wiley Series in Bioinformatics), 2008, ISBN 978-0470041444.
Course Design • Homework (50%): 3 hands-on assignments, requiring implementation/application of a method to a real genomic dataset (~17% each) • Course project (35%): small research project on a related topic of your choice (group work is encouraged). • Class participation (15%): • engaged in discussion/questions during class • 1 ~10-min. presentation and leading discussion of a research paper • Grading: A >=93% A- >=90 B+ ≥ 85% B ≥ 80% B- ≥ 75% C+ ≥ 70% C ≥ 65% C-=S ≥ 60% D+ ≥ 55% D ≥ 50% F < 50%
Course Homepage • URL: http://www-users.cselabs.umn.edu/classes/Spring-2012/csci5461/index.php • Check website frequently for announcements. • Schedule page: slides and other materials • Homework: • can be submitted via email to bvander@cs.umn.edu before the indicated due date • late submissions will receive 20% off per day late • Other policies: read carefully
Three Homework Assignments Statistical methods for identifying differently expressed genes Clustering and classification of microarray data Analysis of protein-protein interaction network
Paper Discussion Paper list: http://www-users.cselabs.umn.edu/classes/Spring-2012/csci5461/index.php?page=paperlist Pick your favorite 3 out of the ~15 papers, rank them, email me preferences by Jan. 23 I will team you up into groups of 2-3 students
Paper Discussion Guidelines Everyone must read the paper before class Discussion leaders prepare ~5 slides for a 10-min. talk with 3-4 questions to encourage class discussion For at least 5 papers, you must submit a 1-paragraph summary of the paper with 3 questions/comments/critiques of the paper (~0.5 pg) (submitted before class starts or no credit) The 15% class-participation grade is based on your participation throughout the semester
Today: Introduction to Functional Genomics, Systems Biology and Bioinformatics First: a short primer on molecular biology
More about the Cell • Multicellular organisms typically begin life as a single cell. The single cell has to grow, divide and differentiate into different cell types to produce tissues and in higher eukaryotes, organs. • Consist of a set of molecules (DNA, RNA, proteins, lipids); these molecules carry out biological processes that drive life.
Important Molecules of the Cell DNA: cell’s genome carrying heritable information. RNA: (mainly) the template for translation of genes into proteins. Protein: amino acid sequences; drive biological processes in cell.
Organization of Genetic Information (DNA) • Genome – the entire collection of genetic information for an organism (1 or more copies per cell). • Chromosomes – storage units of genes. (eukaryotes) • Genes – basic unit of genetic information inherited from one generation to the next. Human chromosomes (male): http://www.accessexcellence.org/RC/VL/
Example: Tissues in the stomach If cells all have the same genetic material, how is this variety encoded?
The Central Dogma of Molecular Biology (eukaryotic cell) Gene expression determines cellular function!
A survey of genome size E. coli - 4 million bp (~1.36 mm) - 3000 genes Genome size S. cerevisiae (baker’s yeast) - 13.5 million bp- 6000 genes ? Mouse ~2.5 billion bp - ~20,000 genes Human - 3 billion bp- ~20,000 genes
A survey of genome size E. coli - 4 million bp (~1.36 mm) - 3000 genes Genome size S. cerevisiae (baker’s yeast) - 13.5 million bp- 6000 genes ? ? Mouse ~2.5 billion bp - ~20,000 genes Water lily - ~120 billion bp Human - 3 billion bp- ~20,000 genes Salamander- ~90 billion bp
A survey of genome size E. coli - 4 million bp (~1.36 mm) - 3000 genes Genome size S. Cerevisiae (baker’s yeast) - 13.5 million bp- 6000 genes Mouse ~2.5 billion bp - ~20,000 genes Water lily - ~120 billion bp Human - 3 billion bp- ~20,000 genes Salamander- ~90 billion bp
Timeline of discovery Gregor Mendel: Phenotype determined by inheritable units James Watson Francis Crick: solve structure of DNA van Leeuwenhoek: described single celled organisms Charles Darwin: “The Origin of Species” 1676 1953 1859 1866 1735 1862 1944 1955 Avery, MacLeod, McCarty: DNA is the genetic material Carl Linnaeus: Hierarchical classification of species Louis Pasteur: Microorganisms responsible for contamination, heating kills microorganisms Frederick Sanger: Complete sequence of insulin
Timeline of discovery 1st public discussion of Human genome sequencing project > 2000 organisms sequenced! C. elegans (roundworm) Yeast genome sequenced Microarrays invented Mouse 1984 1996 2000 1997 2002 1995 1998 2011 ~ 10,000s gene expression studies published! 1st bacterial genome sequenced Human genome sequencing begins E. coli Human, fly finished
An explosion of genome sequences ~2000 organisms completed since 1996 Genome projects underway/complete: Archaeal: 150 Bacterial:2739 Eukaryal: 168 2001 finished http://www.genomesonline.org Partial list: http://www.genomenewsnetwork.org/
What’s left to do? • the sequence is really just a “parts list” • understanding the cell requires learning what each part does (e.g. which other parts it interacts with, which function(s) it carries out) EBI GOA Biological Process annotations (non-IEA,RCA) With rapid sequencing technology, and many complete genomes sequenced, are we done? NO!
Sequencing enabled many more technologies • Gene and protein expression • Protein-protein interactions • Tissue/cellular localization • Genetic interactions • … (adapted from Charles Mallery, http://fig.cox.miami.edu/~cmallery/150/gene/sf13x1.jpg)
Gene expression (microarrays) Samples • Sequence • Gene and protein expression • Protein-protein interactions • Tissue/cellular localization • Genetic interactions • … Genes Garber, Troyanskaya et al. Diversity of gene expression in adenocarcinoma of the lung. PNAS 2001, 98(24):13784-9. What else can we measure? (adapted from Charles Mallery, http://fig.cox.miami.edu/~cmallery/150/gene/sf13x1.jpg)
What else can we measure? Protein-protein interactions • Sequence • Gene and protein expression • Protein-protein interactions • Tissue/cellular localization • Genetic interactions • … Yeast PPI network: Hawoong Jeong et al. Oltvai Centrality and lethality of protein networks. Nature 411, 41-42 (2001) (adapted from Charles Mallery, http://fig.cox.miami.edu/~cmallery/150/gene/sf13x1.jpg)
Nucleus Nuclear Periphery Endoplasmic Retic. Bud Neck Mitochondria Lipid particles What else can we measure? Protein localization • Sequence • Gene and protein expression • Protein-protein interactions • Tissue/cellular localization • Genetic interactions • … Huh, W-K et al. Nature 425, 686−691 (2003). (adapted from Charles Mallery, http://fig.cox.miami.edu/~cmallery/150/gene/sf13x1.jpg)
Why are we generating all of these data? What is “Systems Biology”? “Molecular biology has uncovered a multitude of biological facts, such as genome sequences and protein properties, but this alone is not sufficient for interpreting biological systems. Cells, tissues, organs, organisms and ecological webs are systems of components whose specific interactions have been defined by evolution; thus a system-level understanding should be the prime goal of biology” - Hiroaki Kitano, Nature420, 206-210 (14 November 2002). “Organisms are clearly much more than the sum of their parts,and the behavior of complex physiological processes cannot beunderstood simply by knowing how the parts work in isolation.Systems biology has emerged in the wake of genome sequencingas the successor to reductionism.” - Kevin Strange, Am J Physiol Cell Physiol 288: C968-C974, 2005
A pathway example: p53 signaling pathway TP53 mutations are directly or indirectly responsible for ~50-60% of cancers!
A pathway example: p53 signaling pathway Protein-DNA binding gene expression Phosphorylation of protein • Cell’s response to environmental stress is very complex • p53 protein is the central player, but carries out its function with the help of several other genes/proteins • There are many points where we can quantitatively measure aspects of this process • Understanding biochemical function of components does not always mean we understand the system TP53 mutations are directly or indirectly responsible for ~50-60% of cancers!
Ursus Wehrli, Tidying Up Art, 2003 Understanding complex biological systems Keith Haring, Untitled, 1986
Ursus Wehrli, Tidying Up Art, 2003 Understanding complex biological systems ? inference Keith Haring, Untitled, 1986 Analogy: Genome sequencing, gene expression studies, phenotypic studies give us “organized parts lists”, some dynamics, local interactions From these, we need to reconstruct the system
Key questions in systems biology/functional genomics • What are all of the parts? • How do the various parts interact to carry out biological processes? • How does systems-level organization relate to cellular/tissue/organismal phenotypes? • Dynamic response to environmental stimuli • Robustness • How to biological systems evolve across larger time scales? (comparative genomics)
Where does computation come in? Systems biology • Computation enables critical aspects of systems biology: • Large-scale, comprehensive investigation • Quantitative modeling/measurement of biological phenomena • Integration of complementary data types • Inference of unobservable system characteristics • Predictive models/Synthesis of biological systems Kitano, H.Science 1 March 2002: Vol. 295. no. 5560, pp. 1662 - 1664
Examples of the five major biological networks Yeast protein-protein interaction network Yeast transcription factor-binding network Yeast phosphorylation network Yeast genetic interaction network E. coli metabolic network ©2007 by Cold Spring Harbor Laboratory Press Zhu X et al. Genes Dev. 2007;21:1010-1024
Yeast interaction network Hawoong Jeong et al. Oltvai Centrality and lethality of protein networks. Nature 411, 41-42 (2001)
Microarray example: Studying lung cancer Samples Genes Garber, Troyanskaya et al. Diversity of gene expression in adenocarcinoma of the lung. PNAS 2001, 98(24):13784-9.
Data clustering clinically important: Patient survival for lung cancer subgroups 1 Cum. Survival (Group 1) .8 Cum. Survival (Group 2) Cum. Survival (Group 3) .6 Cum. Survival .4 .2 0 0 10 20 30 40 50 60 Time (months) Garber, Troyanskaya et al. Diversity of gene expression in adenocarcinoma of the lung. PNAS 2001, 98(24):13784-9.
Course Content • Molecular biology background • Overview of genomic technologies • Statistical hypothesis testing • Gene expression analysis • Unsupervised/supervised learning applied to gene expression data • Gene function classification/prediction • Mapping and inference of transcriptional regulatory networks • Mapping and analysis of protein-protein interaction networks • Perturbation/intervention analysis • Genomic/proteomic data integration • Cutting edge topics in systems biology
Course Objectives We expect you to: • Acquire a solid background in fundamental concepts of functional genomics and systems biology. • Learn the state-of-the-art computational methods for biological data analysis. • Develop a general understanding of the current state of the functional genomics field, and learn how to formulate and solve current biological questions with advanced computational methods. • Develop skills for critical evaluation of computational biology literature.
For next time… • For biology/genomics background, read: • DOE primer on human genetics: http://www.ornl.gov/sci/techresources/Human_Genome/publicat/primer2001/primer11.pdf • Molecular Biology for Computer Scientists: http://www-users.cselabs.umn.edu/classes/Spring-2012/csci5461/files/Hunter.pdf • Send me your top 3 paper requests (ranked) by Monday, Jan. 23 http://www-users.cselabs.umn.edu/classes/Spring-2012/csci5461/index.php?page=paperlist