520 likes | 683 Views
Alternative splicing: A playground of evolution. Mikhail Gelfand Research and Training Center for Bioinformatics Institute for Information Transmission Problems RAS, Moscow, Russia October 2008. % of alternatively splic ed human and mouse genes by year of publication.
E N D
Alternative splicing: A playground of evolution Mikhail Gelfand Research and Training Center for Bioinformatics Institute for Information Transmission Problems RAS, Moscow, Russia October 2008
% of alternatively spliced human and mouse genes by year of publication Human (genome / random sample) All genes Human (individual chromosomes) Only multiexon genes Genes with high EST coverage Mouse (genome / random sample)
Roles of alternative splicing • Functional: • creating protein diversity • ~30.000 genes, >100.000 proteins • maintaining protein identity • e.g. membrane (receptor) and secreted isoforms • dominant negative isoforms • combinatorial (transcription factors, signaling domains) • regulatory • E.g. via chanelling to NMD • Evolutionary
Plan • Evolution of alternative exon-intron structure • mammals: • human compared to mouse and dog • mouse and rat compared to human and dog • paralogs • dipteran insects • Drosophila melanogaster, D. pseudoobscura, Anopheles gambiae • many drosophilas • Evolutionary rates in constitutive and alternative regions • human and mouse • D. melanogaster and D. pseudoobscura • many drosophilas • human-chimpanzee vs. human SNPs • Alternative splicing and protein domains • Regulation of AS via conserved RNA structures
Elementary alternatives Cassette exon Alternative donor site Alternative acceptor site Retained intron
EDAS: a database of alternative splicing • Sources: • human and mouse genomes • GenBank • RefSeq • consider cassette exons and alternative splicing sites • functionality: potentially translated vs. NMD-inducing elementary alternatives (in-frame stops, length non divisible by 3)
Alternative exon-intron structure in the human, mouse and dog genomes • Human-mouse-dog triples of orthologous genes • We follow the fate of human alternative sites and exons in the mouse and dog genomes • Each human AS isoform is spliced-aligned to the mouse and dog genome. Definition of conservation: • conservation of the corresponding region (homologous exon is actually present in the considered genome); • conservation of splicing sites (GT and AG)
Caveats • we consider only possibility of AS in mouse and dog: do not require actual existence of corresponding isoforms in known transcriptomes • we do not account for situations when alternative human exon (or site) is constitutive in mouse or dog • of course, functionality assignments (translated / NMD-inducing) are not very reliable
Gains/losses: loss in mouse Commonancestor
Gains/losses: gain in human (or noise) Commonancestor
Gains/losses: loss in dog (or possible gain in human+mouse) Commonancestor
Triple comparison Human-specific alternatives: noise? Human-specific alternatives: noise? Lost in mouse Lost in dog Conserved alternatives Conserved alternatives
Translated and NMD-inducing cassette exons • Mainly included exons are highly conserved irrespective of function • Mainly skipped translated exons are more conserved than NMD-inducing ones • Numerous lineage-specific losses • more in mouse than in dog • more of NMD-inducing than of translated exons • ~40% of almost always skipped (<1% inclusion) exons are conserved in at least one lineage
Mouse+rat vs human and dog: a possibility to distinguish between exon gain and noise
The rate of exon gain: decreases with the exon inclusion rate; increases with the sequence evolutionary rate • Caveat: spurious exons still may seem to be conserved in the rodent lineage due to short time • Solution: estimate “FDR” by analysis of conservation of pseudoexons
Alternative donor and acceptor sites: same trends • Higher conservation of ~uniformly used sites • Internal sites are more conserved than external ones (as expected)
Source of innovation: Model of random site fixation • Plots: Fraction of exon-extending alternative sites as dependent on exon length • Main site defined as the one in protein or in more ESTs • Same trends for the acceptor (top) and donor (bottom) sites • The distribution of alt. region lengths is consistent with fixation of random sites • Extend short exons • Shorten long exons
Genetic diseases • Mutations in splice sites yield exon skips or activation of cryptic sites • Exon skip or activation of a cryptic site depends on: • Density of exonic splicing enhancers (lower in skipped exons) • Presence of a strong cryptic nearby
One more source of innovation: site creation • MAGE-A family of human CT-antigens • Retroposition of a spliced mRNA, then duplication • Numerous new (alternative) exons in individual copiesarising from point mutations Creation of donor sites
Alternative exon-intron structure in fruit flies and the malarial mosquito • Same procedure (AS data from FlyBase) • cassette exons, splicing sites • also mutually exclusive exons, retained introns • Follow the fate of D. melanogaster exons in the D. pseudoobscura and Anopheles genomes • Technically more difficult: • incomplete genomes • the quality of alignment with the Anopheles genome is lower • frequent intron insertion/loss (~4.7 introns per gene in Drosophila vs. ~3.5 introns per gene in Anopheles)
Conservation of D.melanogaster elementary alternatives in D. pseudoobscura genes blue – exact green – divided exons yellow – joined exon orange – mixed red – non-conserved • retained introns are the least conserved (are all of them really functional?) • mutually exclusive exons are as conserved as constitutive exons
Conservation of D.melanogaster elementary alternatives in Anopheles gambiae genes blue – exact green – divided exons yellow – joined exons orange – mixed red – non-conserved • ~30% joined, ~10% divided exons (less introns in Aga) • mutually exclusive exons are conserved exactly • cassette exons are the least conserved
Dyak Dmel Dmoj Dere Dsec Dvir Dgri Dpse Evolution of (alternative) exon-intron structure in nine Drosophila spp. Dana D. melanogasterD. sechelia D. yakuba D. erecta D. ananassae D. pseudoobscura D. mojavensis D. virilis D. grimshawi D. Pollard, http://rana.lbl.gov/~dan/trees.html
Gain and loss ofalternative segmentsand constitutiveexons 0 / 2 0 / 2 7 / 7 1 / 1 Dyak 1 / 7 19 / 23 Dmel Dmoj 5 / 7 2 / 3 Dere Dsec Dana Caveat:We cannot observe exon gain outside and exon loss within the D.mel. lineage 3 / 10 10 / 12 2 / 12 0 / 1 Dvir Dgri 20 / 32 2 / 4 2 / 16 5 / 13 1 / 5 9 / 12 3 / 5 8 / 21 Dpse 8 / 10 3 / 5 1 / 16 7 / 8 5 / 8 1 / 2 6 / 15 8 / 33 Notation: Patterns with single events / Patterns with multiple events (Dollo parsimony) Sample size 397 / 452 18596 / 18874 9 / 21 7 / 12
Evolutionary rate in constitutive and alternative regions • Human and mouse orthologous genes • D. melanogaster and D. pseudoobscura • Estimation of the dn/ds ratio:higher fraction of non-synonymous substitutions (changing amino acid) => weaker stabilizing (or stronger positive) selection
Human/mouse genes: non-symmetrical histogram of dn/ds(const)–dn/ds(alt) Black: shadow of the left half.In a larger fraction of genes dn/ds(alt) > dn/ds(const), especially for larger values
1 Concatenated regions:Alternative regions evolve faster than constitutive ones dN/dS dS dS dN/dS dN dN 0
1 Weaker stabilizing selection (or positive selection) in alternative regions (insignificant in Drosophila) dN/dS dS dS dN/dS dN dN 0
1,5 Drosophila: Synonymous substitutions prevalent in terminal alternative regions; non-synonymous substitutions, in internal alternative regions dN/dS Different behavior of terminal alternatives Mammals: Density of substitutions increases in the N-to-C direction dS dN 0
Many drosophilas:dN in mut. exclusive exons same as in constitutive exonsdS lower in almost all alternatives: regulation?
Many drosophilas: relaxed (positive?) selection in alternative regions
The MacDonald-Kreitman test: evidence for positive selection in (minor isoform) alternative regions • Human and chimpanzee genome substitutions vs human SNPs • Exons conserved in mouse and/or dog • Genes with at least 60 ESTs (median number) • Fisher’s exact test for significance Minor isoform alternative regions: • More non-synonymous SNPs: Pn(alt_minor)=.12% >> Pn(const)=.06% • More non-synonym. substitutions: Kn(alt_minor)=.91% >> Kn(const)=.37% • Positive selection (as opposed to lower stabilizing selection): α = 1 – (Pa/Ps) / (Ka/Ks) ~25% positions • Similar results for all highly covered genes or all conserved exons
What does alternative splicingdo to proteins? • SwissProt proteins • PFAM domains • SwissProt feature tables
Alternative splicing avoids disrupting domains (and non-domain units) Control: fix the domain structure; randomly place alternative regions
… and this is not simply a consequence of the (disputed) exon-domain correlation
Positive selection towards domain shuffling (not simply avoidance of disrupting domains)
Short (<50 aa) alternative splicing events within domains target protein functional sites c) FT positions affected FT positions unaffected Prosite patterns affected Prosite patterns unaffected Expected Observed
An attempt of integration • AS is often species-specific • young AS isoforms are often minor and tissue-specific • … but still functional • although species-specific isoforms may result from aberrant splicing • AS regions show evidence for decreased negative selection • excess non-synonymous codon substitutions • AS regions show evidence for positive selection • excess fixation of non-synonymous substitutions (compared to SNPs) • AS tends to shuffle domains and target functional sites in proteins • Thus AS may serve as a testing ground for new functions without sacrificing old ones
What next? • AS in one species, constitutive splicing, in another (data from microarrays) • Changes in inclusion rates • Evolution of regulation of AS • Control for: • functionality: translated / NMD-inducing (frameshifts, stop codons) • exon inclusion (or site choice) level: major / minor isoform • tissue specificity pattern (?) • type of alternative – 1: N-terminal / internal / C-terminal • type of alternative – 2: cassette and mutually exclusive exon, alternative site
Acknowledgements • Discussions • Eugene Koonin (NCBI) • Igor Rogozin (NCBI) • Vsevolod Makeev (GosNIIGenetika) • Dmitry Petrov (Stanford) • Dmitry Frishman (GSF, TUM) • Data • King Jordan (NCBI) • Support • Howard Hughes Medical Institute • INTAS • Russian Academy of Sciences (program “Molecular and Cellular Biology”) • Russian Foundation of Basic Research
Authors • Andrei Mironov (Moscow State University) • Ramil Nurtdinov (Moscow State University) – human/mouse+rat/dog • Dmitry Malko (GosNIIGenetika, Moscow) – drosophila/mosquito • Ekaterina Ermakova (Moscow State University, IITP) – Kn/Ks • Vasily Ramensky (Institute of Molecular Biology, Moscow) – SNPs, MacDonald-Kreitman test • Evgenia Kriventseva (now at U. of Geneva) and Shamil Sunyaev (now at Harvard U. Medical School) • protein structure • Irena Artamonova (Inst. of General Genetics, Moscow) – human/mouse, plots, MAGE-A • Alexei Neverov (GosNIIGenetika, Moscow) – functionality of isoforms
Bonus track: conserved secondary structures regulating (alternative) splicing in the Drosophila spp. • ~ 50 000 introns • 17% alternative, 2% with alt. polyA signals • >95% of D.melanogaster introns mapped to at least 7 of 12 other Drosophila genomes • Search for conserved complementary words at intron termini (within 150 nt. of intron boundaries), then align • Restrictive search => 200 candidates • 6 tested in experiment (3 const., 3 alt.). All 3 alt. ones confirmed
CG33298 (phopspholipid translocating ATPase): alternative donor sites
Nmnat (nicotinamide mononucleotide adenylytransferase): alternative splicing and polyadenylation
Properties of regulated introns • Often alternative • Longer than usual • Overrepresented in genes linked to development