1 / 1

Introduction

Table 3. An example of U12/U2-type loss/gain and true U12/U2-type conversion. HUMAN ENSG00000001497 6(47) ACGgtaagaaagtgccctggacttggtg..........ctgatgggaccctctttgctggcagGTG 842 7(110) U2

gur
Download Presentation

Introduction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Table 3. An example of U12/U2-type loss/gain and true U12/U2-type conversion HUMAN ENSG00000001497 6(47) ACGgtaagaaagtgccctggacttggtg..........ctgatgggaccctctttgctggcagGTG 842 7(110) U2 MOUSE ENSMUSG00000057421 6(51) AAGGTTAgtatccttggtgcgatatgct..........ctgattggactctttttgctgtcagGTG 863 7(110) U12 CHICK ENSGALG00000004713 3(47) CAGgtaagtatagcctgatctgcttctc..........atgcttggatttttctttcactcagGTT 1354 4(110) U2 ZFISH ENSDARG00000009395 8(47) CAGgtagtgggaccaatctcgcactacg..........tccattgttttggtgtatttggcagACA 1431 9(110) U2 ========================================================================================================================================= HUMAN ENST00000289041 3(97) CGTgtatcctttgcctgctggctgacca..........gaatgaccttaatctggggttctagCCA 1719 4(109) U12 MOUSE ENSMUST00000024866 3(97) CGTgtatcctttgcctgctgcctggtgc..........aaatggccttaatctgtggttctagTCA 1180 4(106) U12 CHICK ENSGALT00000014160 3(97) CCTgtatcctttgcagtctgaacccttc..........aaatgaccttaatctatcattttagCCA 314 4(109) U12 ZFISH ENSDART00000039772 3(97) TATgtatctttttacattttcagctttt..........ttatatccttgattctctcttgaagTCA 2952 4(109) U12 HUMAN ENST00000260930 3(97) AAGgtaccgtgcagcaaagtccagatat..........tgtgcttttcttttgcattctgaagGCA 2028 4(109) U2 MOUSE ENSMUST00000001027 3(97) CAGgtaggtgcagccaagtccagttagg..........tgtgcttctcttttgcattctgaagGCA 3496 4(109) U2 ENSMUST00000040999 3(97) CAGgtacctgccctcaccagcaggcttg..........ttacttcaccacatgaactttgaagTCA 786 4(109) U2 ENSMUST00000040442 3(97) CCAgtatcctttgcactgcctggctatg..........ttctctttacataagcccatcacagCCA 2457 4(109) ??? CHICK ENSGALT00000013325 3(97) ACGgtatccttaacaagaagtctggaaa..........ctttattcaccttaatgttccaaagACA 1199 4(109) U12? ZFISH ENSDART00000043711 3(97) CAGgtattacagtcttcattttactcca..........agaagcattttttttctctgtttagTCA 301 4(109) U2 FUGU SINFRUT00000175866 3(97) CACgtatccttgcaacagctggtggcct..........agcaccgaccttcactcagcattagACA 140 4(109) U12 paralogues U2 intron matrices Acceptor Branch point Donor U12 intron matrices Figure 1. Selection of thresholds for U2-type and U12-type donor site definitions. Donor Acceptor Branch point COMPARATIVE ANALYSIS of U12 INTRONS Nikolai V. Ivanov, Zemin Ning and Richard Durbin The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus Hinxton, Cambridge CB10 1SA, UK Introduction In higher eukaryotes splicing of pre-mRNA occurs with a help of at least two different major (U2) and minor (U12) spliceosomes. Introns, spliced by U12 spliceosome, are rare (<0.5%) and thus, are commonly ignored by the majority of gene prediction and annotation pipelines. However, some well-known disease-related genes such as huntingtin and PTEN contain one or more U12 introns making determination of their precise gene structure challenging. Slower rate of U12 spliceosome processing is thought to contribute to regulation of gene expression. U12 spliceosome, composed of U11, U12, U4atac, U5 and U6atac small nuclear ribonucleoproteins (snRNPs), surprisingly resembles U2 spliceosome in structure and function; however, they seem to evolve independently of each other. U12 spliceosome was initially discovered to operate on AT-AC introns [1,2]. Later, it was shown that GT-AG introns are in fact its major substrate. Sequencing of U11 and U12 snRNAs confirmed that U12 donor (5'-[AG]TATCCTT) and U12 branch point (TCCTTAAC) consensus sequences are remarkably distinct from relatively variable U2 splice sites. The evolution of U12 and U2 introns represents an interesting case study with implications to all gene structures. Burge at al [3] suggested that comparison of orthologous genes from different species could produce the following outcomes: intron conservation, GT-AG and AT-AC subtype conversion, U12/U2 intron conversion and a loss of an intron. We focused our attention mostly on U12-type introns and also introduced the analysis of U12/U2 introns in paralogous genes. We mapped all available human, mouse, chicken and zebrafish ESTs/cDNAs with high accuracy to the corresponding genomes using our new fast algorithm implemented in ssahaEST allowing refined splice site analysis of the genome structure. In this work we focused on detection and evolution of U12 introns in the four eukaryotic genomes. Materials and Methods Expressed sequence tags (ESTs) were downloaded from the NCBI dbEST database (July 8th 2005 release, ftp://ftp.ncbi.nih.gov/repository/dbEST/) for Homo sapiens (~6.1x106), Mus musculus (~4.3x106), Danio rerio (~0.63x106) and Gallus gallus (~0.55x106). Files containing large numbers of FastA formatted sequences were split into files of manageable size (~0.6x106). Alignment of the ESTs to corresponding genomes of H. sapiens (NCBI35), M. musculus (NCBI_m34), D. rerio (WTSI Zv5) and G. gallus (WashU ver. 1) was performed using newly developed ssahaEST program on an SGI Altix machine equipped with 16 IA-64 1.6Ghz processors. ssahaEST combines a fast algorithm for k-mer positioning implemented in SSAHA program [4] and an implementation of the banded Smith-Waterman-Gotoh algorithm from phrap/cross_match package [Phil Green] with high-scoring pair (HSP) clustering and accurately trained splice site models for U2-type and U12-type introns. An intron was classified as a U12-type based on thresholds for individual scores for donor, branch point, and acceptor as well as the branch-to-acceptor distance (<50) derived from the training set 1. This set included introns that were experimentally confirmed to be spliced by U12 spliceosome and orthologous genes from closely related genomes. Matrices for U2 and U12 splice sites were generated using ML method with pseudo counts. The score and length thresholds were derived from a training set 2 compiled from the 368 human U12 introns described by Levine & Durbin [5]; 36 U12-type introns were removed as they did not fit our splice site model for the U12-type intron and had patterns different from those in training set 1 (Figure 1); thus, we cannot be confident they are true U12 introns. Similar U12-type intron definitions were described previously [3, 5, 6]. For comparative studies of U12-type introns between four eukaryotic genomes we have remapped 6883 EnsEMBL (ver. 32) homologous genes to the four corresponding genomes and analysed introns homologous in one genome to the U12-type introns in the other genome. We considered only those introns that were adjacent to the conserved exons. Availability: http://www.sanger.ac.uk/Software/analysis/SSAHA2/ Discussion Our approach to splice site analysis differs from that of the previous work [5] as we are now able to map all ESTs/cDNAs to the best unique location on the genome avoiding potential ambiguity in splice site confirmation. It should be noted that due to very low frequency of U12-type intron occurrence, we had to make highly specific matrices for different subtypes of U12, thus, leading to potentially lower sensitivity of the method and consequent underestimation. Despite this, the number of U12-type introns in the human genome has doubled compared to the previous work [5], mainly because of the increase in number of human ESTs and improvement in quality of human genome assembly over the last four years. Table 1 shows two major trends found in the first part of the analysis. One is that the total number of non-redundant introns correlates with the length of the sequenced portion of the genome. The other is that the fraction of U12-type introns is ~0.3% of all four species, although it is significantly larger in chicken and zebrafish than in mammalian genomes, indicating that there is some intron type turnover. Comparison of homologous genes containing U12-type introns between human and mouse showed that ~50% of U12 introns are being converted to a different type. Although this trend is significantly higher than the one described by Abril et al [7], the conversion results in introns in close but different positions indicating potential loss/gain mechanism as apposed to replacement. True conversion was observed in a few cases of paralogues (Table 3), however, the study is hampered the lack of reliable database of paralogous genes. Results Conclusion Analysis of the overall splice site variation in four eukaryotic genomes We have used ssahaEST to map ~11.6 million ESTs/cDNAs from four organisms: human, mouse, chicken and zebrafish to their corresponding genomes using U2 and U12 splice site models. Table 1 shows the outcome of this experiment. All intron counts represent non-redundant introns uniquely mapped to the genome where only one occurrence of intron start and end is taken into account. Not surprisingly, the majority of the introns (>99%) belongs to U2-type introns. We found no significant differences between the splice site matrices from one species to another. In all four sets, GT-AG introns were the dominant (~70%) U12 subtype and AT-AC introns were the minor (~30%) U12 subtype. Out of 404 human U12-type introns reported previously [5] we were able to identify 368 U12-type introns (260 GT-AG and all 108 AT-AC subtypes). II. Cross genome comparison of U12-type introns in four eukaryotic genomes Mapping of 6883 homologous transcripts to four eukaryotic genomes resulted in identification of 90 human, 115 mouse, chicken and zebrafish U12-type introns, for which we can look at homologues. Comparison of human and mouse genes containing these introns is shown in Table 2. Approximately half (53) of the introns were conserved in intron position and remained a U12-type. In this set we have found no examples of GT-AG and AT-AC subtype conversion. Surprisingly, most of the examples listed as U12/U2-type conversion have not been conserved at the position of the intron and therefore, could be considered as a loss of U12 and gain of U2-type intron at a different position. We found some interesting examples of a true U12/U2 type conversion in paralogous genes (Table 3). However, these cases are hard to quantify due to lack of an appropriate database for paralogous genes. 1. We have developed a fast and accurate method for mapping ESTs/cDNAs in finished eukaryotic genomes and for studying gene structure. 2. We have found ~800 U12-type introns in human and mouse genomes and ~400 U12-type introns in chicken and zebrafish genomes. U12 introns seem to constitute ~ 0.3% of all introns. 3. Our study shows that U12/U2-type conversion between homologous introns of the four eukaryotic genomes most likely occurs by loss/gain mechanism with a change in position of the intron. A true conversion was observed only in cases of paralogous genes. References [1] Jackson IJ (1991) Nucleic Acids Res. 19: 3795-8 [2] Hall SL & Padgett RA (1994) J. Mol. Biol. 239:357-365 [3] Burge C, Padgett RA & Sharp PA (1998) Molecular Cell 2: 773-85. [4] Ning Z, Cox AJ and Mullikin JC (2001) Genome Research 11:1725-9. [5] Levine A & Durbin R (2001) Nucleic Acids Res. 29:4006-13. [6] Zhu W & Brendel V (2003) Nucleic Acids Res. 31:4561-72. [7] Abril JF, Castelo R & Guigo R (2005) Genome Res. 15:111-9.

More Related