430 likes | 622 Views
On the biological significance of alternative splicing: a bioinformatics approach. Sandro J. de Souza TDR, 07/05/2004. RNA 10:757-765, 2004. Genomics. Bioinformatics. Large-scale Biology. The Real Revolution. Early 20 th century: Mendel and the inheritance laws
E N D
On the biological significance of alternative splicing: a bioinformatics approach Sandro J. de Souza TDR, 07/05/2004 RNA 10:757-765, 2004
Genomics Bioinformatics Large-scale Biology
The Real Revolution Early 20th century: Mendel and the inheritance laws Mid 20th century: DNA as the genetic element (Avery) Mid 20th century: Watson and Crick and the structure of DNA. 70’s and 80’s: Molecular biology/biotechnology 90’s and 21th century: Genomics and Bioinformatics Paradigm in Biology: Evolution by means of natural selection (Darwin and Wallace, mid 19th century)
Bioinformatics • Development of tools • Gateway to explore new datasets • Processing of data derived from large-scale projects • A new way to do hypothesis-driven science
Exons Introns mRNA Coding Non-coding
Splicing Splicing depends on recognition of exon-intron boundaries Splice sites are generic and consist solely of: 5’ boundary 3’ boundary Acceptor site Polypyrimidine tract
.....if they occur at the boundaries of the regions to be spliced out, can change the splicing pattern, resulting in the deletion or addition of whole sequences of amino acids. Walter Gilbert. Why genes in pieces. Nature 271:501, 1978.
At least half of all human genes undergo alternative splicing Biological significance or spurious events?
Alternative splicing 1. Chromosomal ratio activates txn of Sxl in females only 2. SXL controls splicing of tra-2 mRNA 3. Females: exon 2 (which has a stop codon) is removed via SXL Males: exon 2 is not removed. Males: no active TRA Females: TRA is made. 5. TRA directs splicing of dsx mRNA in specific manner; in males default splicing occurs.
K+ channel Picture of human cochleal hair cells from http://www.sickkids.on.ca/otolaryngology/Hearloss.asp Sound frequency Cytosolic Ca2+ concentration K+ channel opens Therefore Ca2+ concentration ‘decodes’ frequency Dotted lines show regions of the protein dependent on splicing PM Ca2+ concentration at which K+ channel opens depends on alternative splicing of K+ channel – 576 possible alternative splicing combinations Cytosol AVSGRK AVSGRKAMFARYVPEIAALILNRKKYGGTFNSTRGRK Alternative Splicing – Auditory Hair Cells
Exon skipping Alternative 5’ splic. site Alternative 3’ splic. site Intron Retention Types of alternative splicing: 5´ 3´ mRNA
Large-scale analysis of intron retention in the human transcriptome Pedro F.A. Galante, Noboru Jo Sakabe, Natanja Slager, Sandro J. de Souza
Examples of intron retention events with biological significance • Msl2 in Drosophila • P element in Drosophila • retroviruses
Immature B cells express membrane-bound Ig. Activation leads to production of secreted form Stop codons Stop codons Ig gene Immature B Cell In immature B cells an intron containing an early translational stop signal is removed yielding a long transcript. The additional sequence encodes an transmembrane region. Transmembrane domain Hydrophilic stretch This intron is not removed in activated B cells, giving rise to a truncated (secreted) product Activation Hydrophilic tail Transmembrane domain
Intron retention and cancer CD44 several tumors Gastrin receptor pancreas Ret tyrosine kinase pheochromocytomas Fas receptor T-cell lymphoma
Known mRNAs EST data SAGE data Transcriptome Database Genome Data
Genome-based cDNA clustering Exon 1 Exon 2 Exon 3 DNA RNAm cluster
14% of all human genes show evidence of intron retention Kan, States & Gish (2002) 36% of RefSeq database! After sample statistics: 5%
Distribution of events along transcripts. p << 0.005 This bias can be a product of: p << 0.005 Underreporting of sequences Nonsense-mediated decay (NMD)
2563 out of 3195 (80%) sequences with a retained intron had an exon/exon boundary downstream of the retention event.
Retained introns are shorter P<<<<0.001
Number of domains entirely encoded by: Retained introns only: 02 Exon-intron-exon: 31 Number of domains partially encoded by: Retained introns only: 25 Exon-intron-exon: 10
Retained introns have a higher GC content P<<<<0.001
Did retained introns encode protein domains? • Only retained introns in the CDS were used. • Only retained introns defined by full-length mRNAs were used. • Protein sequences were searched against PFAM database.
Conservation of intron retention in mouse cDNA sequences 40%-57% of all retained introns present a mouse hit Identity of orthologous retained introns is 84% Non-retained introns is 60%; Exons 87% Mouse cDNA also corresponds to an retention variant 26% - 10 out of 46
Frequency of stop codon retained intron exon exon TACTTGTGCGTAGTCCCCGCGATCTAACGCCACGATGGATGACAC TACTTGTGCGTAGTCCCCGCGATCTAACGCCACGATGGATGACACTGTGA Stop codons – TAG, TGA, TAA Found 651 stop codons Expected: 1064 p-value << 0.005 88 cases where the retention generates a putative truncated protein cds mRNA stop mRNA cds
GC content for sequences upstream and downstream the premature stop codon – 88 cases exon retained intron exon 5’ 3’ stop GC 58% GC 49% Are under selective pressure for coding potential
Why the argument of ‘selection’ is important? • As noted originally by Gilbert (1978), mutations that affect splicing can allow the production of new proteins without the loss of the original one • Therefore, there should not be any “negative selection” on this variant. • If, however, the new variant has some biological significance, selection will act to maintain the function of this variant.
Intron Retention in Tumors
Towards a reliable set of intron retention events * full-length vs full-length set and retained intron entirely in the CDS
Second International Conference on Bioinformatics and Computational Biology www.icobicobi.com.br 25-28/10/2004 Angra dos Reis
Group of Computational Biology Sandro J. de Souza tennis player Helena Samaia Research Assistant Ana C. Pereira Admin. Assistant Maarten Leerkes Ph.D student Noboru Sakabe Ph.D student Maria Vibranovski Ph.D student Elza Helena Ph.D student Natanja Slater Ph.D student Pedro Galante Ph.D student Elisson C. Osorio programmer Jorge E. de Souza Ph.D student Rodrigo Soares programmer Andre Zaiats system admin.