730 likes | 935 Views
Origins and Evolution of Novel Genes. Aoife McLysaght Trinity College Dublin. Novelty. Promoter. exon. intron. exon. Bricolage. Long et al., 2003. Duplication. Polyploidy – whole genome duplication Aneuploidy – chromosomal duplication Partial chromosome duplication Gene duplication
E N D
Origins and Evolution of Novel Genes Aoife McLysaght Trinity College Dublin
Promoter exon intron exon
Bricolage Long et al., 2003
Duplication • Polyploidy – whole genome duplication • Aneuploidy – chromosomal duplication • Partial chromosome duplication • Gene duplication • Partial gene duplication
Gene Duplication • Create new genes • Generate multigene families / multidomain genes
Exon/domain shuffling Gene structure Protein domains Domain complexity increases with organismal complexity Rubin et al., Science, 2000
Duplicability • Survivorship/maintenance of gene duplicates may depend on: • protein function • higher duplicability of metabolic genes in yeast (Marland et al, 2004) • network centrality • more highly connected proteins have lower duplicability in yeast but higher duplicability in human • evolutionary rate • higher duplication of slowly evolving genes (Davis and Petrov, 2004) • dosage balance • dosage-balanced genes are retained after whole genome duplication (WGD) but unlikely to experience small-scale duplication (SSD)
Fate of Duplicated Genes: Examples • Neofunctionalisation • GLUD2 in primates has a new role in neurotransmitter flux • Thrombin (cleaves fibrinogen during clotting) and trypsin (digestive enzyme) are derived from a complete gene duplication • Lactate dehydrogenase can be converted into malate dehydrogenase with a single amino acid replacement (out of total protein length of 317 amino acids) • Subfunctionalisation • SIR3 and ORC1 gene pair in yeast • Have divergent functions, but single ancestral-type protein from another yeast has both functions • Dosage increase • Esterase B in mosquito • increased gene dosage confers greater pesticide resistance • Functional compensation • Many duplicated genes shelter the organism from deleterious mutations in the other copy (shown in yeast and worm)
Functional compensation of duplicate genes Nematode Essentiality of duplicated genes Liao BY and Zhang. Trends in Genetics (2007) Duplicate genes usually overlap in function. Sequence divergence of duplicated genes correlates with their capacity for back up function. Conant GC and Wagner A. Proc. R. Soc. Lond. B (2004)
Polyploidisation • Global increase in genome • Addition of one or more complete chromosome sets • 2 copies : diploid • 3 : triploid (sterile) • 4 : tetraploid • 6 : hexaploid
Examples of Paleopolyploids • Yeast • Arabidopsis • Wheat • Fish • Ancestral vertebrate (2R)
Loss or retention of genes duplicated by WGD (ohnologs) • Most duplicates are subsequently lost • Biased retention of certain classes of genes • Retained duplicates are enriched for: • Developmental genes • Transcription factors • Metabolic genes • Protein complex membership
Dosage-balance hypothesis Pathway Gene A Gene B Gene C Gene D Gene E Dosage-balanced genes are not robust to gene loss and gene duplication. Pathway Gene A Gene B Gene C Gene D Gene E
Whole genome duplication and dosage-balanced genes Pathway Gene A Pathway Gene B Gene A Gene C Gene B Gene D Gene C Gene E Gene D Gene E WGD duplicates all genes simultaneously and therefore does not perturb relative dosages. Whereas SSD of dosage-balanced genes is likely to be deleterious, WGD should be neutral. Furthermore, once duplicated by WGD they are unlikely to be lost
De novo origins Conversion of 3’ UTR into coding sequence Incorporation of transposable elements into coding sequence
De novo origin of whole protein-coding genes • Origin of an open reading frame (ORF) from ancestrally non-coding sequence • Single-base substitutions or small indels that remove a stop codon • Acquisition of expression activity • Considered to be very rare events
New genes in Drosophila • Levine et al. 2006, PNAS • Five de novo originated genes found in Drosophila melanogaster • Begun et al. 2007, Genetics • 11 genes that likely appeared in D. yakuba or the D. yakuba / D. erecta ancestor were identified using testis-derived ESTs • Testis biased expression • Often X-linked • Zhou et al., 2008, Genome Research • 9 genes (some overlap with previous papers) • Estimate 12% of new genes arose de novo
New genes in Saccharomyces • Cai et al. 2008, Genetics • BSC4 identified as a de novo gene in S. cerevisiae (132 aa) • DNA similarity but no ORF in closely-related yeastsS. paradoxus, S. mikatae and S. bayanus • Transcibed in these other yeast lineages • Origin of protein-coding gene from RNA gene • Deletion of DUN1 or RPN4 is lethal if BSC4 is also deleted • PeptideAtlas evidence supports translation • Purifying selection • Possibly involved in the DNA repair pathway
De novo origin of mouse-specific gene • Heinen et al., 2009, Current Biology • Non-coding RNA gene • 3 exons, alternatively spliced • Specifically expressed in post-meiotic cells of the testis • Indel mutations in 5’ regulatory region • Possible selective sweep
Human-Chimp Divergence • 99% identity of alignable sequence • High colinearity of gene order What is the genetic distinction? • Regulatory differences? • Differential gene duplication and loss? • 40-45Mb of species-specific euchromatic sequence • Unique genes?
Differential gene duplication and loss Demuth et al, 2006 Hahn et al, 2007
Genome Quality Issues Genomic location of the human Chr10 genes } EnsEMBL family containing the Centaurin Gamma 2 gene Within synteny blocks Out of synteny blocks Hs Chr10 Pt Chr10 Hs Chr7 Hs Chr10 Pt Chr2b Hs Chr2 Pt Chr12 Hs Chr12 Hahn et al 2007 Genetics Hs Chr7 Pt Chr7
De novo origins of monkey genes • Toll-Riera et al., (2009) MBE • Examined “primate orphans” • Protein-coding • Present in human and macaque but absent in older lineages
This study: Have new genes arisen de novo recently in the human lineage?
Unique human genes? • All-against-all BLASTP search identified 644 human genes with no match in the chimp genome • Candidate novel genes • examine these in great detail
Genome Quality Issues • Several spurious/trivial causes of apparent gene gain • candidate novel gene is spurious (human genome annotation error) • sequence gaps – gene is present but unsequenced • Chimp genome annotation error – gene is sequenced but unannotated
Strategy • Synteny-based approach • Gene order is conserved between close taxa • Regions of conserved gene order are likely to be ancestral • The expected location of a gene can be identified and carefully examined Human ? Chimp
Synteny Blocks • Blocks with conserved gene order built using unambiguous orthologs: • String of orthologs no more than 10 genes apart in either genome. • Small local gene order differences permitted.
644 450 194 3
Novel human protein-coding genes All short ORFS No introns within coding sequence
ORF origins • Examine orthologous DNA from chimp and macaque • Identify “disablers” - sequence differences that obstruct the ORF • Single base differences that cause an early stop • Frame-shift inducing indels that result in an early stop codon • Absence of a start codon
CLLU1 Start • Chronic Lymphocytic Leukemia Upregulated gene 1 (CLLU1) • Identified in a search for differentially expressed genes in chronic lymphocytic leukemia (Buhl et al 2006) • Located in a EST dense region • Overlapping another gene, CLLU1OS, in the opposite strand
Human origin or parallel primate inactivation of ancient gene?
CLLU1 Start • Chronic Lymphocytic Leukemia Upregulated gene 1 (CLLU1) • Identified in a search for differentially expressed genes in chronic lymphocytic leukemia (Buhl et al 2006) • Located in a EST dense region • Overlapping another gene, CLLU1OS, in the opposite strand
Are these ORFs actually genes? • The presence of an ORF does not guarantee that the gene is coding, i.e., that a protein is produced • PRIDE • PRoteomics IDEntifications is a public database for proteomics data • Peptide Atlas • Public database of peptides identified by mass spectrometry
CLLU1 DNAH10OS C22orf45
Human-chimp sequence divergence Total (and nonsynonymous) base subsitutions pooled over all three genes 5 (2) 7 (3)
Human population polymorphism • ORF is present intact in all sequenced individuals (public data) • No convincing evidence for a selective sweep from published genome-wide scans of HapMap data.
How might these genes arise? • Sequence analysis traced the origin of the ORF, but these must also be expressed. • Expression of a new gene • ENCODE project indicated that much of the genome is transcribed • All three of these genes overlap other genes • CLLU1 is in a permissive expression environment
De novo genes: Summary • 3 identified cases under strict criteria • Estimate about 18 should exist • All have evidence of transcription and translation • ORF formation allowed by human specific mutation in all cases • No “re-use” of coding sequence of previously-existing genes, but perhaps re-use of regulatory sequences.
Gene duplication Innovation Robustness Neofunctionlisation Functional compensation
Defining essential genes A gene is considered “essential” if its removal results in a lethal or sterile phenotype. Non-essential genes (other phenotypes) Essential genes (Lethal or sterile) Wild type eyeless vestigial Fly http://www.exploratorium.edu Kolodziej PA et al. Neuron (1995) Wild type foxn1 Mouse http://www.crj.co.jp Garacia MU et al. PNAS (2005) Fly: 2540 essential and 5197 non-essential genes Mouse: 2109 essential and 2969 non-essential genes
Evolutionary impact of gene duplications PE - proportion of essential genes Singletons Duplicates count lethal knockouts count lethal knockouts PE singletons PE duplicates >> =