E N D
Genome Biology and Biotechnology Functional Genomics
Functional Genomics – the Paradigm Shift • Large-scale genome sequencing generates • “parts lists” • complete inventories of genes and functional elements • A new challenge • to understand the function of themany genes predicted • In general 90% to 95% of the genes have unknown functions • Genome sequencing has triggered a transition • from vertical (reductionist) approaches • to horizontal (large scale) approaches • Each approach has its own strengths and weaknesses Reprinted from: Vidal M., Cell, 104, 333 (2001)
Vertical Versus Horizontal Approach Reprinted from: Vidal M., Cell, 104, 333 (2001)
The vertical approach • The vertical or reductionist approach • Studies one or afew proteins or genes at a time by applying different experimental tools to testhypotheses • Well proven by decades of research • The reductionist approach is based on the principle of • “understanding the whole by studying selected parts” • The reductionist approach has severe limitations • lacks efficiency • In well-studied model organisms decades of hypothesis driven research has discovered only 5 to 10% of the genes • Fails to give a comprehensive picture of biology • The study of Gal4p provideda useful model of how transcription factorswork but • gives no insight in global transcriptional responses
The horizontal Approach • The horizontal or large scale approach • studies large numbers of genes or proteins in parallel using • high-throughput tools • Microarrays, systematic gene knock outs… • Instrumentation for automated and high-throughput analysis • Robots: automated liquid handlers • Automated data acquisition instruments: e.g. sequencers • Well suited for massively parallel studies • Large scale approach is limited • lack of giving conclusive evidence • Noisy data with high rates of false positives or negatives • Observations must to be confirmed
Genes 1 2 3 4 5 n Omics “Conditions” Functional gene maps • Functional genomicscan be regardedas • functional mapping within two-dimensional matrices • One axis correspondsto all genes of an organism • The otheraxis represents a set of conditions to which the organism isexposed • Experimental conditions, various mutant backgrounds • Each “omics” approach represents a different map
Functional Mapsor “-omes” Genes or proteins 1 2 3 4 5 n “Conditions” Genes ORFeome Mutational phenotypes Phenome Transcriptome Expression profiles Cellular, tissue location Localizome DNA Interactome Protein-DNA interactions Interactome Protein interactions Proteome proteins After: Vidal M., Cell, 104, 333 (2001)
Thebasic rationale of functional genomics • Functionally related genes share common properties • Are likelyto be coregulated at the transcriptional level • Transcriptome maps consist of ''expression clusters'' of coregulated genes • Loss-of-function mutations should confer similar or opposite phenotypes • Phenome maps consist of sets of genes giving similar phenotypes or ''pheno-clusters'' • Their protein products are likely tointeract physically • Interactome mapsconsist of networks of interactingproteins ''interaction clusters'' • Their protein products are likely tolocalize in similar cellular compartments • Have similar location in the localisome
Integration of Functional Maps • Functional maps provide a rough indication of gene function • Integration of functional maps in a biological atlasovercomes this limitation by • overlaying sets of functional characteristics Reprinted from: Vidal M., Cell, 104, 333 (2001)
Functional Genomics • Functional Genomics provides the tools for • Identifying the function of “all genes” • overlaying sets of functional characteristics • Functional maps provide listsof clusters that contain both characterizedand uncharacterized genes • Provides hypotheses for the function of uncharacterized genes • Functional Genomics provides approaches for • the ultimate understanding of life at themolecular level based on the description of • Each protein individually and • The interactions between the proteins involvedin particular biological processes
Genome Biology and Biotechnology 6. The ORFeome
Functional Mapsor “-omes” Genes or proteins 1 2 3 4 5 n “Conditions” Genes ORFeome Mutational phenotypes Phenome Transcriptome Expression profiles DNA Interactome Protein-DNA interactions Localizome Cellular, tissue location Interactome Protein interactions Proteome proteins After: Vidal M., Cell, 104, 333 (2001)
The ORFeome: Genes in the Genome • The genome represents • the basic compendium of “all genes” that make up an organism • The ORFeome represents • the basic compendium of all protein coding genes as defined by their Open Reading Frames (ORFs) • Predicted ORFs must be validated • In higher organisms gene identification is complicated by • Intron / exon structure • The ORFeome platforms provide • Large scale approach for validating predicted genes • high throughput recombinational cloning technology • Resources for functional genomics projects
Recombinational Cloning • One step cloning technology • Site specific recombination instead of restriction/ligation • Not dependent on availability of restriction sites • “100%” efficient: only one recombinant DNA product without byproducts • No cloning step needed: no need to assay independent clones – • Fully automatable – simple pipetting in microtiter plates • Very precise recombination system • allowing high fidelity DNA engineering • Versatile cloning technology • Genes can be easily transferred into a range of vector systems • Expression, Gene fusion, RNAi… • GATEWAY Recombinational Cloning • Based on the bacterio phage lambda integration & excision system Reprinted from: Walhout et al, Science 287: 116 (2000)
Phage lambda integration & excision system phage attP attP Bacterial genome attB Integration attR attL Excision
Recombinational Cloning of ORFs start stop Designer oligo ORF attB1 cDNA attB2 PCR attB1 attB2 ORF Phage lambda integration: Integrase & bacterial IHF TG Entry Vector TG - Toxic gene Reprinted from: Walhout et al, Science 287: 116 (2000)
Recombinational Cloning of ORFs attP1 attP2 attB1 attB2 Phage lambda integration: Integrase & bacterial IHF attL1 attL2 Reprinted from: Walhout et al, Science 287: 116 (2000)
Recombinational Cloning of gene Fusions Destination vector Entry clone Destination vector attL attR DNA binding domain Activation domain Phage lambda excision: Integrase, IHF & Exisionase Reprinted from: Walhout et al, Science 287: 116 (2000)
GATEWAY Recombinational Cloning • First generation cloning technology • DNA Cloning Using In Vitro Site-Specific Recombination • Hartley et. al., Genome Research 10, 1788-1795 (2000) • Designed for large scale cloning of ORFs • High throughput platform for generating ORFeome libraries • Second generation technology • Concerted Assembly and Cloning of Multiple DNA Segments Using In Vitro Site-Specific Recombination • Cheo et. al., Genome Research 14:2111-2120(2004) • Designed for large scale production of multi-segment expression clones
Second generation att sites and BP cloning Synthetic attB and attP sites Int cut site Int cut site BP cloning Left arms Right arms 4 simultaneous reactions Reprinted from: Cheo et. al., Genome Research 14:2111-2120 (2004)
Multi-segment recombination cloning Three-segment cloning Two-segment cloning Reprinted from: Cheo et. al., Genome Research 14:2111-2120 (2004)
Multi-Segment Expression Clones • The expanded repertoire of recombination sites for • Concerted cloning of multiple DNA segments in a predefined order, orientation, and reading frame • Generates collections of functional elements in a combinatorial fashion • Applications • linkage of promoters to genes • generation of fusion proteins • assembly of multiple protein domains • The technology has broad implications for • gene function analysis. • expression of multidomain proteins Reprinted from: Cheo et. al., Genome Research 14:2111-2120 (2004)
The ORFeome of C. elegans version 1.0 • ORFeome cloning was first demonstrated in C. Elegans. • Predicted ORFs are amplified by PCR from a highly representative cDNA library using • ORF-specific primers • Cloned by GATEWAYrecombination cloning • The C. elegans genome sequence predicted 18,959 ORFs 9.503identified genes 9.888 Untouched genes Reprinted from:Reboul et. al., Nat.Genet.27, 332(2001)
PCR amplification of C. elegans ORFs PCR products of identified genes PCR products of Untouched genes Reprinted from: Reboul et. al., Nat.Genet.27, 332 (2001)
Successful PCR for the ORFs analyzed Reprinted from: Reboul et. al., Nat.Genet.27, 332 (2001)
Conclusions • ORFeome strategy provides experimental evidence for • structure of genes in C. elegans • ORFeome resource for large scale functional genomicsversion v1.1 • Attempted PCR amplification of the 19,477 ORFs • cloned 10,623 (55%) in-frame ORFs • ORF Sequence Tags improved C. elegans gene annotations • corrected the internal gene structure of 20% of the ORFs. Reprinted from: Reboul et. al., Nat.Genet.27, 332 (2001)
C. elegans ORFeome Version 3.1 Gene prediction improvements Classification of the 4232 repredicted and new ORFs Reprinted from: Lamesch et. al., Genome Research 14:2064-2069 (2004)
The C. elegans ORFeome is an evolving resource Reprinted from: Lamesch et. al., Genome Research 14:2064-2069 (2004)
Conclusions • Cloning of a complete ORFeome is an iterative process • requires multiple rounds of experimental validation together with gradually improving gene predictions (bioinformatics) • the ORFeome resource provides further verification of the predicted gene structures • Note that the procedure will not reveal alternatively spliced transcripts unless GATEWAY clones are cloned individually • ORFeome projects now underway • Human • Arabidopsis • Drosophila Reprinted from: Lamesch et. al., Genome Research 14:2064-2069 (2004)
Versatile Gene-Specific Sequence Tags for Arabidopsis Functional Genomics Hilsonet. al., Genome Research 14:2176-2189(2004) • Paper presents • The creation of a collection of gene-specific sequence tags (GSTs) • representing 21,500 Arabidopsis genes • Gene-specific sequence tags (GST) • Correspond to short (150bp to 500bp) segments of ORFs • selected to have no significant similarity with any other region in the genome • Synthesized by PCR amplification from genomic DNA • The GSTs provide a resource for large-scale gene function studies in multicellular eukaryotes • RNA interference • Microarray transcript profiling
Graphical representation of GSTs GST Predicted gene Reprinted from: Hilson et. al., Genome Research 14:2176-2189 (2004)
GST production High throughput PCR High throughput verification Reprinted from: Hilson et. al., Genome Research 14:2176-2189 (2004)
GST cloning in GATEWAY vectors Reprinted from: Hilson et. al., Genome Research 14:2176-2189 (2004)
The Caenorhabditis elegans Promoterome Dupuy et. al., Genome Research 14:2169-2175(2004) • Paper presents • The development of a genome-wide resource of C. elegans promoters • characterize the expression patterns of all predicted genes • expressing localization markers such as the green fluorescent protein (GFP). • "localizome" maps should provide information on • where (in what cells or tissues) genes are expressed • when (at what stage of development or under what conditions) genes are expressed • in what cellular compartments the corresponding proteins are localized
The C. elegans promoterome • "promoters" correspond to upstream intergenic regions (IGR) • region from the ATG of the ORF to the end of the preceding ORF • PCR fragment upper size limit of 2 kb to ensure high cloning efficiency ORF Reprinted from: Dupuy et. al., Genome Research 14:2169-2175 (2004)
Overview of promoterome cloning procedure analysis of PCR products large-scale cloning of the promoterome Reprinted from: Dupuy et. al., Genome Research 14:2169-2175 (2004)
Overview of promoterome cloning procedure analysis of PCR products large-scale cloning of the promoterome Reprinted from: Dupuy et. al., Genome Research 14:2169-2175 (2004)
Applications of recombinational cloning Reprinted from: Dupuy et. al., Genome Research 14:2169-2175 (2004)
Genome Biology and Biotechnology The phenome
Functional Mapsor “-omes” Genes or proteins 1 2 3 4 5 n “Conditions” Genes ORFeome Mutational phenotypes Phenome Transcriptome Expression profiles Cellular, tissue location Localizome DNA Interactome Protein-DNA interactions Interactome Protein interactions Proteome proteins After: Vidal M., Cell, 104, 333 (2001)
The phenome: genome-wide phenotypic analysis • Classical (forward) genetic screens • Saturated mutagenesis to identify all the genes that exhibit a specific phenotype • Draw back • characterization of the gene through positional cloning is slow and laborious • Phenomics platforms: Reverse genetics • Systematic alteration of gene function to identify the functions of predicted genes • Advantage • Identity of the gene is known beforehand • Phenomics platforms • Transposon-based mutant libraries • Extensively used in yeast and Arabidopsis • RNA interference (RNAi)-based mutant libraries • the technology of choice for gene knock-outs
Large-scale analysis of the yeast genome by transposon tagging and gene disruption Ross-Macdonald et al., Nature 402: 413 (1999) • Paper presents • a transposon-tagging strategy to perform large-scale analysis of gene function in yeast to simultaneously study • phenotypes • gene expression • protein localization • a large collection (>11,000 strains) of yeast mutants carrying a transposon inserted in genes • Tagged 30% of all yeast genes
Haemaglutinin tag No ATG: gene fusions Transposon-based Method for the Large-scale Functional Genomics • Minitransposon (mTn) • Derived from the bacterial transposable element Tn3 • LacZ reporter gene lacking an initiator methionine and upstream promoter sequence • b-galactosidase (b-gal) is produced when lacz is fused in-frame to the protein-coding sequence • Haemaglutinin (3xHA) epitope tag • Recombination of the lox sites produces epitope tagged proteins Reprinted from: Ross-Macdonald et al., Nature 402: 413 (1999)
Minitransposon mTn–3xHA/lacZ Gene-lacZ fusion protein Cre-mediated recobination Gene-3xHA fusion protein
High Throughput Insertion Mutagenesis • Yeast genomic DNA library • mutagenized with mTn • plasmids were digested with Not I • transformed into a diploid yeast strain • Integrated by homologous recombination • Transformants were assayed for b-gal activity Reprinted from: Ross-Macdonald et al., Nature 402: 413 (1999)
Analysis of the MTn Insertion Strains • Identified 11,232 strains expressing lacZ • Sequenced the site of insertion in 6,358 strains • 5,442 in or within 200 bp of an annotated ORF • Insertions affect 1,917 different ORFs (~30%) • Identified 328 previously non-annotated ORFs • 52% overlap an ORF in the antisense direction • 33% are in intergenic regions - small ORFs • 15% overlap an ORF in the same orientation in a different frame • In the annotation genes are missed because of • Arbitrary lower size limit of 100 amino acids • Not annotating partially overlapping ORFs Reprinted from: Ross-Macdonald et al., Nature 402: 413 (1999)
Analysis of Mutant Phenotypes • Phenotypes of essential genes • 14.1% of the insertions are non viable in haploid strains • Represent genes that are essential for viability • Large scale scoring of “other” phenotypes • growth under 20 different growth conditions • 'phenotypic macroarrays' (96-well format) • Insertions in 407 genes (20%) result in a phenotype different from the wild type • The majority (80%) of the insertions exhibit no phenotype! • Expand the range of phenotypic assays • Utilize more precise criteria for phenotypic analysis • Growth rate Reprinted from: Ross-Macdonald et al., Nature 402: 413 (1999)
Phenotypic Macroarray Analysis of Yeast Mutants mutants deficient in oxidative phosphorylation mutants deficient in cell-wall maintenance Reprinted from: Ross-Macdonald et al., Nature 402: 413 (1999)
Genomic Scale Analysis of Phenotypes • Phenotypes observed • Expected phenotypes • genes involved in microtubule functions - sensitive to benomyl • Unexpected phenotypes • Genes involved in cell wall biogenesis - stress-related responses • Pleiotropic phenotypes: observed in apparently unrelated assays • Sensitivity to hydroxyurea, benomyl and calcofluor • Pleitrophic mutants are the rule • Many mutants exhibit phenotypes in specific subsets of conditions • Mutants appear to ‘group' into discrete classes • “pheno-clusters” represent groups of mutants having common disruption phenotypes Reprinted from: Ross-Macdonald et al., Nature 402: 413 (1999)
Cluster Analysis of the Phenotypic Data Transformants sorted by increasing distance from the cluster average Growth conditions Reprinted from: Ross-Macdonald et al., Nature 402: 413 (1999)