320 likes | 441 Views
BI 83201: The Literature of Computational Genomics. Instructor: Prof. Jeffrey Chuang Meeting Time: Fridays 10:30-11:45 Higgins 465 Requirements: Read and discuss 1-2 papers per week. Grading will be based on participation. Attendance at all sessions is mandatory.
E N D
BI 83201: The Literature of Computational Genomics • Instructor: Prof. Jeffrey Chuang • Meeting Time: Fridays 10:30-11:45 Higgins 465 • Requirements: Read and discuss 1-2 papers per week. Grading will be based on participation. Attendance at all sessions is mandatory. • Course website: bioinformatics.bc.edu/chuanglab/courses.htm • All papers will be available online at least 1 week before discussion. Students will be assigned sections/figures, for which they will be expected to lead the discussion, including asking other students questions. • Office Hours by arrangement: Contact Jeff at chuangj@bc.edu, Phone: 2-0804, Higgins 444B (soon to be moving to Higgins 420).
Changing perspectives in yeast research nearly a decade after the genome sequence Kara Dolinski and David Botstein BI 83201 : Literature of Computational Genomics January 27, 2006
I. Introduction The yeast Saccharomyces cerevisiae was the 1st sequenced eukaryotic genome (1996). It is 12 million base pairs long(Human is 3 billion), over 16 chromosomes. It was chosen because it has an extensive history as a model organism, along with the worm C. elegans and the fly D. melanogaster.
Major Benefits of Sequencing the Yeast Genome • Ability to identify clones via sequencing, rather than genetic or physical mapping methods. • Creation of yeast strains, each with a deletion of one gene, for every gene in the genome. • Whole genome expression assays. • A "grand unification,” showing that protein sequence similarity persists between yeast, mouse, human, fly, and worm, i.e. functional similarity often also means sequence similarity.
From the parts list to the system level: Goals of post-genome-sequence yeast research • Understand and annotate every functional feature in the genome. • Understand the interactions of every feature – “systems biology”
A central goal of yeast research remains the determination of the biological role of every sequence feature in the yeast genome. The most remarkable change has been the shift in perspective from focus on individual genes and functionalities to a more global view of how the cellular networks and systems interact and function together to produce the highly evolved organism we see today.
Genes and their biological roles • 1995: The number of characterized genes was 1000-2000. • 2006: 5773 genes in the genome. 4299 are characterized. • Annotation of individual functions remains challenging.
List of the major sources of yeast functional genomics data; in addition to the main SGD site, yeast genome data are also distributed via SGD Lite (http://sgdlite.princeton.edu), a lightweight yeast genome database, which is built from GMOD components and can be downloaded and installed locally.
Gene expression technology and the emergence of system-level biology • Two major expression technologies developed • SAGE (Serial Analysis of Gene Expression) • mRNA Microarrays
SAGE Serial Analysis of Gene Expression
Example of an mRNA Expression Microarray Figure 1. Yeast genome microarray. The actual size of the microarray is 18 mm by 18 mm. Derisi et al. Science 24 October 1997:Vol. 278. no. 5338, pp. 680 - 686
Defining functional or regulatory subsystems, or "modules“ Study all the genes that respond to certain stresses, e.g. temperature change, starvation, radiation. Study genes that are active in “natural” behaviors: cell cycle, sporulation, pheromone response. Identify genes that are often co-expressed and/or co-regulated, such as ribosomal genes.
Distinct temporal patterns of induction or repression help to group genes that share regulatory properties. (C) Seven members of a class of genes marked by early induction with a peak in mRNA levels at 18.5 hours. Each of these genes contain STRE motif repeats in their upstream promoter regions. Science 24 October 1997:Vol. 278. no. 5338, pp. 680 - 686
Expression Levels Are Highly Condition Dependent It is quite rare for genes to have unchanging expression levels across different experiments; for example, expression of the yeast actin (ACT1) gene, which was traditionally used as a control in Northern blots to ensure that equivalent levels of RNA were loaded in each well, changes significantly in several diverse types of microarray experiments
Analysis and Display of Genome-scale Data How can such a vast amount of expression data be analyzed, managed, and presented? Clustering algorithms group genes with similar expression profiles over different experiments.
Example of an mRNA Expression Microarray Figure 1. Yeast genome microarray. The actual size of the microarray is 18 mm by 18 mm. Derisi et al. Science 24 October 1997:Vol. 278. no. 5338, pp. 680 - 686
Clustering of Gene Expression Profiles Eisen et al. (1998) PNAS 95:14863
Gene Ontology A functional annotation system to allow one to search for biases in clusters of genes. Broad terms are the parents to more specific terms. Consistent annotation system across species.
A Clustered Group of Genes and Its Functional Annotation The Gene Ontology allows one to assess the statistical significance in bias for functional categories.
Insights into the global transcriptional network Co-regulated genes should share a common transcription factor binding site. Computational methods to search for motifs shared among co-regulated genes (REDUCE, AlignACE, MODEM).
Comparative Genomic Approaches to Finding Transcription Factor Binding Sites GCN4 BAS1 PHO2 RAP1 GCN4 TATA YCL030C HIS4 SCer GCAGTCGAACTGACTCTAATAGTGACTCCGGTAAATTAGTTAATTAATTGCTAAACCCATGCACAGTGACTCACGTTTTTTTATCAGTCATTCGA SPar GCAGTCGAACTGACTCTAATAGTGACTCCGGTAAATTAGTTAATTAATTGCTAAACCCATGCACAGTGACTCATGTTTTTT-ATCAGTCATTCGA SMik GCGGTCAAACTGACTCTAATAGTGACTCCGGTAAATTAGTTAATTAATTGCTAAACCCATGCACAGTGACTCATGCTTTCT-ATCAGTCATTCGA SBay -TGAACGAACTGACTCTAATAGTGACTCTGGTAAATTAGTTAATTAATTTCTAAACCCATGCACAGTGACTCATGTTTTGTTATCAGTCATTCGT * ********************* ******************** *********************** * *** * ************ SCer TATAGAAGGTAAGAAAAGGATATGACT----ATGAACAGTAGTATACTGTGTATATAATAGATATGGAACGTTATATTCACCTCCGATGTGTGTT SPar TAGAGAAGGTAAGAAAAGGATATGACT----ATGAACAGTAATATACTATGTATATAATAGATAAGGAACGTTATATTCACCTTGGATGTGTGTT SMik TACAGA-GGTAAGAAAAGCGAACTACT----AAGAACAGTGGTACATGGTGTATATAATAGATAAGGAACAT-GTATTCACTTTTAATGTGAGTT SBay TAAAGA-AGAAAGAGAGGAAGATGACTCAAAATAAATACTAGTGTATTGTGTATATAACAGAGATGGAACACTGGATTC-CACCTAATGTGTGTT ** *** * **** * * * *** * ** * * * * ********* *** * ***** **** * ***** *** SCer GTACATACATAAAAATATCATAGCACAACTGCGCTGTGTAA---TAGTAATACAATAGTTTACAAAATTTTTTTTCTGAATA--- SPar GTACATACATAAGAATATCATACTACAAGTGCGCTGTGTAA---TAGTAACATAATAGTTAACAA-----TTTTTTTGAATA--- SMik GTCTATA-AGAAGAATAGTATACCACAAGCGTGCTGTGTAACGATAATAATATAACAATTTACAAGATT-TTTTTTTGAATA--- SBay GTCCATACATAGAATTAGTATACCACAATTGCGCTGTGTAA---TAATAACATAATAGATTACAAAA---TTTTGGAAAAAAAAA ** *** * * * ** *** **** * ********* ** *** * ** * * **** **** ** * Alignments of 4 – 13 yeast species, to determine unusually conserved motifs.
Integration of Data Sources Harbison and colleagues (2004) used a combination of experimental (chIP-chip), comparative genomics, and motif discovery methods to identify putative DNA binding sites for >200 transcription factors in yeast. Bayesian network takes as input different properties of sequence elements upstream of a gene and outputs the likelihood of that gene exhibiting a particular expression pattern
Interaction Networks Synthetic lethal interactions protein-DNA interactions protein-protein interactions.
Synthetic Lethal Interactions Genetic interaction network representing the synthetic lethal/sick interactions determined by SGA analysis. Genes are represented as nodes, and interactions are represented as edges that connect the nodes. Up to 1000 genes and 4000 interactions.
Protein-DNA interactions Transcription factor Binding site
Motifs in the E. Coli Transcriptional Regulatory Network Nature Genetics 31, 64 - 68 (2002)
Protein-Protein interactions Problem: Experiments are not robust Verification by checking for co-expression of orthologs in other species. Check for “joint” sequence conservation of orthologs. Other data integration methods.
Protein-Protein Interactions Outline of the comprehensive two-hybrid analysis. We cloned almost all yeast ORFs individually as a DNA-binding domain fusion (bait) in a MATa strain and as an activation domain fusion (prey) in a MAT strain, and subsequently divided them into pools, each containing 96 clones. These bait and prey clone pools were systematically mated with each other, and the diploid cells formed were selected for the simultaneous activation of three reporter genes (ADE2, HIS3, andURA3) followed by sequence tagging to obtain ISTs. PNAS | April 10, 2001 | vol. 98 | no. 8 | 4569-4574
Conclusions and some thoughts about the Future • Most new understanding has come from comparative genomics. • Genome-scale data has provided new goals • Other important areas – allelic effects, gene localization, metabolism dynamics, how selection operates on networks. • Philosophy – how should large scale data be used to generate and test hypotheses?