280 likes | 428 Views
Alternative splicing: A playground of evolution. Mikhail Gelfand Research and Training Center for Bioinformatics Institute for Information Transmission Problems RAS, Moscow, Russia October 2006. % of alternatively splic ed human and mouse genes by year of publication.
E N D
Alternative splicing: A playground of evolution Mikhail Gelfand Research and Training Center for Bioinformatics Institute for Information Transmission Problems RAS, Moscow, Russia October 2006
% of alternatively spliced human and mouse genes by year of publication Human (genome / random sample) All genes Human (individual chromosomes) Only multiexon genes Genes with high EST coverage Mouse (genome / random sample)
Plan • Evolution of alternative exon-intron structure • mammals: human, mouse, dog • dipteran insects: Drosophila melanogaster, D. pseudoobscura, Anopheles gambiae • Evolutionary rate in constitutive and alternative regions • human / mouse • D. melanogaster / D. pseudoobscura • human-chimpanzee / human SNPs
Elementary alternatives Cassette exon Alternative donor site Alternative acceptor site Retained intron
Alternative exon-intron structure in the human, mouse and dog genomes • EDAS: a database of human alternative splicing (human genome + GenBank + EST data from RefSeq) • consider casette exons and alternative splicing sites • functionality: potentially translated vs. NMD-inducing elementary alternatives • Human-mouse-dog triples of orthologous genes • We follow the fate of human alternative sites and exons in the mouse and dog genomes • Each human AS isoform is spliced-aligned to the mouse and dog genome. Definition of conservation: • conservation of the corresponding region (homologous exon is actually present in the considered genome); • conservation of splicing sites (GT and AG)
Caveats • we consider only possibility of AS in mouse and dog: do not require actual existence of corresponding isoforms in known transcriptomes • we do not consider situations when alternative human exon (or site) is constitutive in mouse or dog • of course, functionality assignments (translated / NMD-inducing) are not very reliable
Translated cassette exons constitutive
Observations • Predominantly included exons are highly conserved irrespective of function • Predominantly skipped translated exons are more conserved than NMD-inducing ones • Numerous lineage-specific losses • more in mouse than in dog • Still, ~40% of skipped (<1% inclusion) exons are conserved in at least one lineage
Alternative donor and acceptor sites: same trends • Higher conservation of ~uniformly used sites • Internal sites are more conserved than external ones (as expected)
Alternative exon-intron structure in fruit flies and the malarial mosquito • Same procedure (AS data from FlyBase) • cassette exons, splicing sites • also mutually exclusive exons, retained introns • Follow the fate of D. melanogaster exons in the D. pseudoobscura and Anopheles genomes • Technically more difficult: • incomplete genomes • the quality of alignment with the Anopheles genome is lower • frequent intron insertion/loss (~4.7 introns per gene in Drosophila vs. ~3.5 introns per gene in Anopheles)
Conservation of D.melanogaster elementary alternatives in D. pseudoobscura genes blue – exact green – divided exons yellow – joined exon orange – mixed red – non-conserved • retained introns are the least conserved (are all of them really functional?) • mutually exclusive exons are as conserved as constitutive exons
Conservation of D.melanogaster elementary alternatives in Anopheles gambiae genes blue – exact green – divided exons yellow – joined exons orange – mixed red – non-conserved • ~30% joined, ~10% divided exons (less introns in Aga) • mutually exclusive exons are conserved exactly • cassette exons are the least conserved
a) Dme, Dps Aga CG1517: cassette exon in Drosophila, alternative acceptor site in Anopheles
Dme, Dps Aga CG31536: cassette exon in Drosophila, shorter cassette exon and alternative donor site in Anopheles
Evolutionary rate in constitutive and alternative regions • Human and mouse orthologous genes • Estimation of the dn/ds ratio:higher fraction of non-synonymous (changing amino acid) substitutions=> weaker stabilizing (or stronger positive) selection
Concatenates of constitutive and alternative regions in all genes: different evolutionary rates Columns (left-to-right) – (1) constitutive regions; (2–4) alternative regions: N-end, internal, C-end • Relatively more non-synonimous substitutions in alternative regions (higher dN/dS ratio) • Less amino acid identity in alternative regions
Individual genes: the rate of non-synonymous to synonymous substitutions dn/ds tends to be larger in alternative regions (vertical acis) than in constitutive regions (horizontal acis)
Non-symmetrical histogram ofdn/ds(const)–dn/ds(alt) Black: shadow of the left half.In a larger fraction of genes dn/ds(const)<dn/ds(alt), especially for larger values
The same effect is seen in: N-terminal, internal, C-terminal parts
Drosophilas: less selection in alternative regions? More mutations in alt. regions Similar level of mutations More mutations in const. regions In a majority of genes, both synonymous and non-synonymous mutation rates are higher in alternative regions than in constitutive regions
Different behavior of N-terminal, internal and C-terminal alternatives N-terminal alternatives: most genes have higher syn. substit. rate in alt. regions; most genes have higher stabilizing selection in alt. regions Internal alternatives: intermediate situation C-terminal alternatives: more non-synonymous substitutions and less synonymous substitutions => lower stabilizing selection in alternative regions
The MacDonald-Kreitman test: evidence for positive selection in (minor isoform) alternative regions • Human and chimpanzee genome mismatches vs human SNPs • Exons conserved in mouse and/or dog • Genes with at least 60 ESTs (median number) • Fisher’s exact test for significance Minor isoform alternative regions: • More non-synonymous SNPs: Pn(alt_minor)=.12% >> Pn(const)=.06% • More non-synonym. mismatches: Dn(alt_minor)=.91% >> Dn(const)=.37% • Positive selection (as opposed to lower stabilizing selection): α = 1 – (Pa/Ps) / (Da/Ds) ~ 25% positions • Similar results for all highly covered genes or all conserved exons
An attempt of integration • AS is often genome-specific • young AS isoforms are often minor and tissue-specific • … but still functional • although unique isoforms may result from aberrant splicing • AS regions show evidence for decreased negative selection • excess non-synonymous codon substitutions • AS regions show evidence for positive selection • excess non-synonymous SNPs • AS tends to shuffle domains and target functional sites in proteins • Thus AS may serve as a testing ground for new functions without sacrificing old ones
What next? • Multiple genomes • many Drosophila spp. • ENCODE data for many mammals • Estimate not only the rate of loss, but also the rate of gain (as opposed to aberrant splicing) • Control for: • functionality: translated / NMD-inducing • exon inclusion (or site choice) level: major / minor isoform • tissue specificity pattern (?) • type of alternative: N-terminal / internal / C-terminal • Evolution of regulation of AS • Splicing errors and mutations: retained introns, skipped exons, cryptic sites
Acknowledgements • Discussions • Vsevolod Makeev (GosNIIGenetika) • Eugene Koonin (NCBI) • Igor Rogozin (NCBI) • Dmitry Petrov (Stanford) • Dmitry Frishman (GSF, TUM) • Shamil Sunyaev (Harvard University Medical School) • Data • King Jordan (NCBI) • Support • Howard Hughes Medical Institute • INTAS • Russian Academy of Sciences (program “Molecular and Cellular Biology”) • Russian Fund of Basic Research
Authors • Andrei Mironov (Moscow State University) • Ramil Nurtdinov (Moscow State University) – human/mouse/dog • Dmitry Malko (GosNIIGenetika) – drosophila/mosquito • Ekaterina Ermakova (Moscow State University, IITP) – Kn/Ks • Vasily Ramensky (Institute of Molecular Biology) – SNPs • Irena Artamonova (GSF/MIPS) – human/mouse, plots • Alexei Neverov (GosNIIGenetika) – functionality of isoforms