580 likes | 734 Views
Intron number evolution and alternative splicing functioning as bridge in evolution. Kemin Zhou, Ph.D. April 22, 2011. Splicing pre-mRNA. Branch. 5’SS. 3’SS. A. Poly Pyrimidine Track. mRNA. Intron. Input Data: 16 Fungal Genomes. Phylum. Ascomycota. Basidiomycota. Zygomycota.
E N D
Intron number evolution and alternative splicing functioning as bridge in evolution Kemin Zhou, Ph.D. April 22, 2011
Splicing pre-mRNA Branch 5’SS 3’SS A Poly Pyrimidine Track mRNA Intron
Input Data: 16 Fungal Genomes Phylum Ascomycota Basidiomycota Zygomycota Chytridiomycota
0.1 Basidiomycota 7.29 cryneo1 7.32 Agaricomycetes 7.89 copci1 Lacbi1 Tremellomycetes 7.21 6.90 Pospl1 0.05 Sporo1 0.07 0.62 -0.14 7.08 Phchr1 0.28 -0.03 Pucciniomycotina 0.04 0.05 Batde5 5.9 -0.23 Agaricomycotina -0.28 Chytridiomycota -0.02 1.68 ustma1 80 0 -5.57 0 -1.07 Ustilaginomycotina 7.25 -3.48 Zygomycota 0 Saccharomycotina Phybl1 6.18 Pezizomycotina -0.04 -2.33 -0.04 64 -1.21 Dothideomycetes 1.44 Picst3 -0.40 -0.01 Mycfi1 2.49 -0.02 -0.003 0.1 0.01 Ascomycota 3.33 Necha2 0.01 Mycgr1 -0.03 Trire2 Trive1 2.48 Aspni1 Sordariomycetes 3.35 3.31 Eurotiomycetes 3.76
-3.48 3.77 Ascomycota 0 0.0 7.25 Basidiomycota 7.25 Chytridiomycota 5.90 -1.07 6.18 Zygomycota 6.18 Conservative Estimated Number of Exons
Reverse TranscriptaseRNA-dependent DNA polymerase is a DNA polymerase enzyme that transcribes single-stranded RNA into double-stranded DNA. It also helps in the formation of a double helix DNA once the RNA has been reverse transcribed into a single strand cDNA. RNA DNA
Reverse Transcriptase Enzymology Arkadiusz Bibillo and Thomas H. Eickbush J. Biol. Chem. 2002 Exponential Distribution fall-off rate=lambda R2 RT 1094-nt RNA R2 RT 4.0 x 10E-3/nt AMV RT 9.3 x 10E-3/nt 590-nt RNA R2 RT 2.6 x 10E-3/nt AMV RT 5.4 x 10E-3/nt poly(rA) R2 RT 0.4 x 10E-3/nt AMV RT 1.5 x 10E-3/nt
Intron Loss by Homologous Recombination Genomic DNA RNA Polymerase II mRNA Reverse Transcriptase Partial cDNA Homologous recombination
10 10 20 Aspni1 Mycfi1 Mycgr1 Necha2 8 8 40 15 6 6 10 4 4 20 5 2 2 0 0 0 0 0 2000 4000 6000 0 2000 4000 6000 0 2000 4000 6000 0 2000 4000 6000 10 10 10 10 copci1 Picst3 Trire2 Trive1 8 8 8 8 6 6 6 6 4 4 4 4 2 2 2 2 0 0 0 0 Counts 0 2000 4000 6000 0 2000 4000 6000 0 2000 4000 6000 0 2000 4000 6000 10 10 10 40 cryneo1 Pospl1 Lacbi1 Phchr1 8 8 8 30 6 6 6 20 4 4 4 10 2 2 2 0 0 0 0 0 2000 4000 6000 0 2000 4000 6000 0 2000 4000 6000 0 2000 4000 6000 10 10 10 10 Sporo1 Phybl1 ustma1 Batde5 8 8 8 8 6 6 6 6 4 4 4 4 2 2 2 2 0 0 0 0 0 2000 4000 6000 0 2000 4000 6000 0 2000 4000 6000 0 2000 4000 6000 Length (Nucleotides) RT Foot Prints (RTFP)
400 Aspni1 300 Mycgr1 Mycfi1 Necha2 250 500 250 300 200 400 200 200 300 150 150 100 200 100 100 0 20 40 60 80 100 0 20 40 60 80 100 0 20 40 60 80 100 0 20 40 60 80 100 70 350 Picst3 Trire2 copci1 Trive1 400 700 50 250 300 500 30 150 200 300 10 50 100 Count 0 20 40 60 80 100 0 20 40 60 80 100 0 20 40 60 80 100 0 20 40 60 80 100 Lacbi1 Phchr1 1000 400 500 500 300 800 400 400 300 200 300 600 cryneo1 Pospl1 200 100 200 400 0 20 40 60 80 100 0 20 40 60 80 100 0 20 40 60 80 100 0 20 40 60 80 100 800 400 Phybl1 Batde5 ustma1 100 350 600 300 80 250 400 200 60 Sporo1 150 100 40 200 50 0 20 40 60 80 100 0 20 40 60 80 100 0 20 40 60 80 100 0 20 40 60 80 100 Percent Relative Location from 5’-End Intron Relative Location
8 7.66 Sporo1 7 6 cryneo1 Pospl1 Phchr1 Lacbi1 Phybl1 copci1 5 Mean Number of Exons 4 Batde5 Aspni1 Necha2 3 Trive1 Trire2 Mycgr1 Mycfi1 2 ustma1 Picst3 1 0.40 0.42 0.44 0.46 0.48 0.50 Mean Relative Intron Location Number of Exons of Ancestor
Exon Length As a Function of Intron Number 1000 Average Exon Length 600 200 0 8000 6000 Total Exon Length 4000 2000 0 10 20 30 40 50 60 70 Number of Introns
3000 0 280067 2500 1 206910 2 206494 2000 Count 1500 1000 500 0 0 100 200 300 400 500 80 160 Exon Length Fungal Exon Length Distribution
Average Module Size 25 aa 1996 Proc. Natl. Acad. Sci. USA Vol. 93, pp. 14632–14636, December 1996 Evolution Intron positions correlate with module boundaries in ancient proteins (intron evolutionyintrons-early) SANDRO JOSE DE SOUZA*, MANYUAN LONG, LLOYD SCHOENBACH, SCOTT WILLIAM ROY, AND WALTER GILBERT Department of Molecular and Cellular Biology, Biological Laboratories, Harvard University, 16 Divinity Avenue, Cambridge, MA 02138 2003 Phylogenetically Older Introns Strongly Correlate With Module Boundaries in Ancient Proteins Alexei Fedorov,1,2 Scott Roy,1 Xiaohong Cao,1,3 and Walter Gilbert1,4 1Department of Molecular and Cellular Biology, Harvard University, Cambridge, Massachusetts 02138, USA
Protein Length 400 aa F-test on log(protein length): p-value = 2.2e-16 Exp(6)=403.4
Ancestral number of exons 400 aa 1200 nt/gene = 16 exons/gene 75 nt/exon 2005 A General Tendency for Conservation of Protein Length Across Eukaryotic Kingdoms Daryi Wang,* Mufen Hsieh,* and Wen-Hsiung Li* *Computational and Evolutionary Genomics, Center for Genomics Research, Academia Sinica, Taipei, Taiwan; and Department of Ecology and Evolution, University of Chicago
Intron number by RT effect 9.69 Intercept: 9.69 ± 1.99 Slope: -0.30 ± 0.16 Intercept: 4.04 ± 0.35 Slope: -0.11 ± 0.03 4.04
7 Sporo1 y = 0.503 x + 1.172 No Sporo1, p-val=8.196e-07 6 Pospl1 cryneo1 5 Lacbi1 Phchr1 Phybl1 copci1 4 Batde5 Necha2 Trire2 3 Mycgr1 Aspni1 Trive1 Mycfi1 2 ustma1 Picst3 2 3 4 5 6 7 8 Conserved Genes Have More Introns Exon Number Species-specific Exon Number Conserved in All Species
8 Lacbi1 copci1 cryneo1 Sporo1 Phchr1 7 Pospl1 Phybl1 6 Batde5 5 number of exons all between Phylum 4 Species Aspni1 Trive1 Necha2 Trire2 3 p-value: Mycfi1 Mycgr1 0.0006588 2 ustma1 Picst3 16.5 17.0 17.5 18.0 log (genome size) Number of Exon vs. Genome Size
70 60 50 40 30 20 Intron Length Difference (SSG-GCAS) 10 0 -10 -20 Trive1 Trire2 Picst3 ustma1 Mycfi1 copci1 Sporo1 Phybl1 Pospl1 Lacbi1 Phchr1 Necha2 Mycgr1 Batde5 Aspni1 cryneo1 Introns are getting longer for less conserved genes except for genomes with very few introns.
Chi-square Test Younger – Older Genes Difference of frequency Relative Location 1/10 (floor(reloc*10))
Timing of Intron Loss • Dramatic intron loss happened during the earlier evolution of the ancestor of Ascomycota. • Basidiomycota: • Most genomes had little intron gain loss since divergence from common ancestor • Lacbi1 younger genes have more introns located to both ends, indication for modern exon shuffling • Two yeasts: Picst3 and ustma1 younger genes have more introns near 3’-end relative to older genes
Number of exons in ancestor • Previous results about 5.8 exons/gene • This study: 7.25, 7.66, 9.69, and 16 • First three methods under estimate • 16 is the most unbiased estimated
Gene Birth Big Bang • Previous evolution has generated short modules of about 25 aa on average • In a very short time scale, genes were formed by a large scale exon-shuffling process • This ancient gene pool has about 16 exons on average • Subsequence evolution is dominated by intron loss
Ancient nature and bridging function of alternative splicing
Evidence-based Alternative Splicing Genome EST + COMBEST Gene Model with AS
Characteristics of Input EST Genome
Intron length and Splice Sites Frequency Intron length
Distribution of Number of Alternatively Spliced Forms Count Number of Models per Gene Fig 1
Higher Coverage, Longer Assembly, More Alternative Splicing and Antisense Aspergillus aculeatus: 70% AS of multiexon genes 338,255,050 EST *Coverage is the normalized mapped total EST length over Genomic length
Alternative Splicing Correlates with number of exons, expression level, and length of longest intron Linear regression analysis of number of AS against number of exons, profmaxh, max intron length, and mRNA length
Restriction on Contributing Factors • Intrinsic Property • Contribution from each intron is smaller for intron-rich genomes • Long introns are more predictive of AS in genomes with short average intron length • External Measure • Exceedingly high EST coverage lower the contribution from expression level (saturation effect)
Ancient Measure by Conservation Pattern Archaea P Bacteria Fungi Protein Eukaryota U Without Fungi Ancient proteins should be conserved in all three kingdoms
Top 10 Isoforms from Aspca3 Manually examined top 100, this appears to be the case.
Genes with AS Tend to be more Ancient None AS Protein AS Protein % Hits All Three Kingdoms Blast Search Against nr - Fungi
Genes with AS Are More Likely to Be Conserved within Ascomycota Protein Sequence Best hits between Aspca3 and Spoth1 Χ-squared=4280.4, df=3, p-value < 2.2e-16
Genes with AS Are More Conserved between Ascomycota and Basidiomycota X-square=2360.138, df=3, p-value < 2.2e-16
AS Profile Not Conserved AP: AS x AS pair 0: perfect, 1: almost perfect, 2: indes, 3: partial, and 4: no alignment
Basic Types of Alternative Splicing 5’ splice site selection Alternative Donor AD 3’ splice site selection Alternative Acceptor AA Cassette Exons Exon skipping / Retention CE One or more Intron retention RI Composite Types
Composite Types AP AD+AA CE+AD CE+AA CE variants: AD, AA, ADAA CE+AP AA or CE Alternative 3’ exons no overlap 3’ ends for cassette exons May not be end for other Between CE and AA
Intron Retention Variants 2 or more intron retention indicating Genomic comtamination?
Mutually exclusive exons ME Significant Overlap of Middle AA AD
AS Type Distribution Fraction of Total AS Category
RI Types • No RI: Intron Not retained • Minor: Retained in a minor isoform • Major: Retained in a major isoform Major is the most abundant isoform Minor is not the most abundant isoform 3n+1, 3n+2 Length { 3n, 3n+1, 3n+2 }
<2.2E-16 8.18E-08 2.01E-04 0.03164 4.91E-09 0.03877 <2.2E-16 3.488E-4 5.29E-11 0.981 2.12E-10 0.003041 0.001705 0.575 0.607 0.005329 1.06E-02 0.5647 <2.2E-16 1.02E-03 0.0177 0.03213 <2.2E-16 0.0983 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 Stop No No No No No No No No No No No No Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes 0.327 NRI 0.936 0.358 Minor 0.054 0.748 Major 0.010 0.260 NRI 0.783 0.262 Minor 0.183 0.689 Major 0.034 0.190 NRI 0.985 0.238 Minor 0.013 0.512 Major 0.002 0.258 NRI 0.751 0.316 Minor 0.189 0.568 Major 0.060 3n 3n+1 3n+2 Chlre4 Spoth1 Agabi2 Aspca3 NRI or Minor RI favor PTCMajor RI avoids PTC P-value of Chi-Square test against 1/3 Stopless Frac. RI Type Frac
Intron Phase 1 2 0 UUUGCAAUUCUAGAAGAC F A I L E D
0.7 ph0 0.6 ph1 0.5 ph2 0.4 0.3 0.2 0.1 0 No Yes No Yes No Yes No Yes Stop Agabi2 Aspca3 Chlre4 Spoth1 B. Difference against population 0.150 0.100 0.050 0.000 -0.050 -0.100 -0.150 <2.2e-16 <2.2e-16 <2.2e-16 1.959E-14 <2.2e-16 <2.2e-16 <2.2e-16 1.09E-04 Stopless Introns Favor Phase 0 More A. Fractions of three phases P-value from Chi-Square Test against 1/3
<2.2e-16 <2.2e-16 <2.2e-16 <2.2e-16 1.06E-07 <2.2e-16 0.989 0.615 0.6596 0.00177 0.166 0.000178 Stopless 3n favors Phase 1 Chi-Square test against stopless population A 0.7 ph0 ph1 ph2 0.6 0.5 0.4 0.3 0.2 0.1 0 B 3n 3n+1 3n+2 3n 3n+1 3n+2 3n 3n+1 3n+2 3n 3n+1 3n+2 Agabi2 Aspca3 Chlre4 Spoth1 0.15 0.1 0.05 0 -0.05 -0.1 -0.15 B. Difference against population -0.2