510 likes | 540 Views
Application of Bioinformatics, Proteomics, and Genomics. Introns, Spcicing, and Alternative splicing. Out of the coding business. STATISTICS. 92% of mammalian genes have exon/intron structures while only 8% of genes are intron-free. The average segmented gene of these species
E N D
Application of Bioinformatics, Proteomics, and Genomics Introns, Spcicing, and Alternative splicing
Out of the coding business
STATISTICS 92% of mammalian genes have exon/intron structures while only 8% of genes are intron-free The average segmented gene of these species contains between 8 and 9 introns The total length of human introns exceeds one billion nucleotides, representing 35-40% of the euchromatic part of our genome The average size of human introns is about 5,500 bp, while the median is approximately 1,500 bp.
Introns are notorious for controversies over interpretation of their origin and function 25-year dispute between introns-late and introns-early theories
What do we know about introns? • Introns are the ubiquitous genomic elements of eukaryotes whose role is still poorly understood and appreciated.
Many human introns are extremely long: • 1,234 human introns are longer than 100 kb • 299 are longer than 200 kb • 9 are longer than 500 kb Largest human genes: • cell recognition molecule Caspr2 • Dystrophin (DMD) • CUB and Sushi multiple domains 2.3Mb (25 introns) 2.2 Mb (78 introns) 2.1 Mb (70 introns) Maximal number of introns in a human gene: • titin isoform N2-A 312 introns (cds=80,870bp)
Paradox with extra-large introns Splicing junctions (intron 5`- and 3`-termini) must be brought closely together by the spliceosome in order to remove an intron from the pre-mRNA. The larger the intron, the more remote its ends are from one another. 5`-end 3`-end intron Removal of an intron during splicing
Theoretically, the difficulty of bringing intron’s termini together in our 3-D world is proportional to the cube of its length -- L3, where L is the length of an intron. Therefore, for a 100,000 nt long intron, it is one million times harder to bring its ends together than for a 1000 nt long intron. Comparative size of 100 kB intron
The enormous intron size in mammals creates several drawbacks, such as: 1) considerable waste of energy during gene expression, which is “unwisely” spent on polymerizing extra-long intronic segments of pre-mRNA molecules; 2) delay in obtaining protein products (on average it takes about 45 min for RNA polymerase II to transcribe a 100,000 bp intron); 3) potential errors in normal splicing, since long introns contain numerous false splice sites (so-called pseudo- exons). Some benefits must be associated with introns to compensate for these disadvantages. Different constructive roles for introns are described in two reviews: Fedorova L., Fedorov A. Introns in gene evolution. Genetica 2003, 118: 123-131. Fedorova L., Fedorov A. Puzzles of the human genome: why do we need our introns? Current Genomics 2005, Vol. 6, 589-595.
Intron functions 1) sources of non-coding RNA 2) carriers of transcription regulatory elements 3) actors in alternative and trans-splicing 4) enhancers of meiotic crossing over within coding sequences 5) substrates for exon shuffling 6) signals for mRNA export from the nucleus and nonsense-mediated decay
• Removal of nuclear (spliceosomal) introns is extremely complex process which requires up to 250 proteins and several small non-coding RNAs (U1, U2, U4, U5, U6) http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=mcb.figgrp.2890 Video http://vcell.ndsu.edu/animations/mrnasplicing/movie-flash.htm •
Reed R. Mechanisms of fidelity in pre-mRNA splicing. Curr. Opin. Cell Biol. 12, 340-345, 2000 There is a competition between SR-proteins and hnRNPs to build a net over pre- mRNA sequences. In cases of alternative splicing the structure of a pre-mRNA- protein complex and, thus the ultimate processing of pre-mRNA, depends on the concentrations of different SR-proteins and hnRNPs in the nucleus.
Group II introns in bacterial world F. Martinez-Abarca and N. Toro Molecular Microbiology 2000, 38:917-926 http://www.fp.ucalgary.ca/group2introns/wherefound.htm
Group I intron Adams et al. RNA (2004)
Evolutionarily distant species share only a portion of common introns Animals and plants have no more than 50% of common intron positions in orthologous genes Fedorov et al. PNAS 2002, 99:16128 Rogozin et al. Curr Biol 2003, 13:1512
Intron loss and gain during evolution
How to find intron loss or gain Analysis of closely related species (mouse, rat, human) gain - - + Case 1: loss - Case 2: + + mouse rat homo
Results of intron comparison Compared species # genes Total # introns # gain # loss Mouse vs Rat 360 1,459 0 1 (Rat) Human 1,560 10,020 0 5 (mouse) vs Mouse
Characterization of deleted introns LENGTH OF CORRESPONDING INTRON (nt) 291(Hs) SPECIES GENE NAME S-adenosylmethionine decarboxylase Ribosomal protein s18 Mouse 81 (Hs) Mouse Adaptor-related complex 1, mu 2 su Laminin alpha 5 113(Hs); (? Rat) Mouse 107(Hs) Mouse Transcription factor usf 245(Hs); 223(rat) Mouse Tumor suppressor p53 393 (mouse) Rat
Conclusions A large-scale computational analysis of the human, D. melanogaster, C. elegans, and A. thaliana genomes has been performed. 147,796 human introns, 106,902 plant, 39,624 Drosophila and 6,021 C. elegans introns were examined. Different types of homologies between introns were found, but none showed evidence of simple intron transposition. No single case of homologous introns in non-homologous genes was detected. Thus we found no example of transposition of introns in the last 50 million years in humans, in 3 million years in Drosophila and C. elegans, or in 5 million years in Arabidopsis. Either new introns do not arise via transposition of other introns or intron transposition must have occurred so early in evolution that all traces of homology have been lost.
Genome Research 2003 For nearly 15 years, it has been widely believed that many introns were recently acquired by the genes of multicellular organisms. However, the mechanism of acquisition has yet to be described for a single animal intron. Here, we report a large-scale computational analysis of the human, Drosophila melanogaster, Caenorhabditis elegans, and Arabidopsis thaliana genomes. We divided 147,796 human intron sequences into batches of similar lengths and aligned them with each other. Different types of homologies between introns were found, but none showed evidence of simple intron transposition.
Alternative splicing Production of multiple mRNA isoforms from the same gene often in a tissue-specific or development-stage-specific manner
Mutually exclusive exons Isoform A Isoform B
Optional exons Isoform A Isoform B
Alternative 5`-sites Isoform A Isoform B
Retained introns Isoform A Isoform B
Why alternative splicing is important ? •Half of human genes express multiple alternative mRNA isoforms, many of which have important specific functions. • Alternative splicing alone increases the number of different polypeptides in human cells by 2-3 fold above the number of human genes
Alternative splicing in Drosophila melanogaster The study of sex-determination development in Drosophila involving the alternatively spliced Sex-lethal (Sxl), male-specific-lethal-2 (msl2), transformer (tra), and doublesex (dsx) genes led to the initial discovery of ESE (reviewed by Cline and Meyer 1996; MacDougall et al. 1995).
The best-known case of alternative splicing in invertebrates is the DSCAM gene of Drosophila. The estimated number of its alternative isoforms (~38,000) exceeds by almost three times the total number of fruit fly genes (Black, D.L., 2000. Protein diversity from alternative splicing: A challenge for bioinformatics and post-genome biology. Cell 103: 367- 370. ).
Alternative splicing in human • Defects in alternative splicing are associated with several human diseases: 1) frontotemporal dementia with parkinsonism, 2) amyotrophic lateral sclerosis, 3) paraneoplastic neurological disorders, 4) maybe some forms of schizophrenia, • Many types of cancer are linked to the altered patterns of alternative splicing. For instance, alternative isoforms of Bcl-2 family of apoptotic regulators have opposite apoptotic activities; frequently anti-apoptotic isoforms are over- expressed in lymphoma cells. • “Neurexin: three genes and 1001 products” TIG 14:20-26,1998. Missler M. and Sudhof T.C.
Comparative genomics Up to 60% of splicing isoforms are conserved between human and mouse What about plants?
For computational biology the most efficient way to study alternative splicing is analysis of EST database
Expresed Sequence Tag (EST) database dbEST (Nature Genetics 4:332-3;1993) is a division of GenBank that contains sequence data and other information on "single-pass" cDNA sequences, or Expressed Sequence Tags, from a number of organisms. A brief account of the history of human ESTs in GenBank is available (Trends Biochem. Sci. 20:295-6;1995). Also, consult the special "Genome Directory" issue of Nature (vol. 377, 28 1995). Nov 2004 total number of all est sequences 25,000,000 Human 6,000,000
Poor quality of EST EST AA000974: 1 gtggttatcgcggcgcccaccaggcctctgatctccnantcttctccaac 50 ||||||||||||||||||||| ||||||||||||| | |||||||||||| gene “ “FLJ41131” ”: 70 gtggttatcgcggcgcccacc-ggcctctgatctctgagtcttctccaac 118
Human EST are represented by all tissues 149013 Organ: head 135512 Organ: breast 125277 Organ: colon 123119 Organ: placenta 121321 Organ: Liver and Spleen 112510 Organ: lung 94405 Organ: pooled 93693 Organ: nervous 82727 Organ: kidney 76447 Organ: uterus 72207 "adenocarcinoma" 69891 Organ: prostate 65607 "melanotic melanoma" 63537 "Purified pancreatic islet" 57908 "PLACENTA COT 25- NORMALIZED" 57891 "carcinoid" 53732 Organ: testis 53077 "pooled germ cell tumors" 51540 "germinal center B cell" 49768 "Soares_testis_NHT" 48706 Organ: whole brain 46973 "retinoblastoma" 45951 Organ: heart 45027 "686 (synonym: hlcc3)" 42857 Organ: brain 41852 Organ: stomach 41549 Organ: Liver 39495 "leiomyosarcoma" 39249 "Ascites" 38697 Organ: marrow 35082 "large cell carcinoma" 33777 "adenocarcinoma, cell line"
Detection of alternative splicing using EST database. mRNA ESTs
Useful information that could be obtained from the EST database • Different forms of alternative splicing • Correlation of alternative splicing with particular tissues, diseases, and stages of development
Alternative Splicing Database (ASD) http://www.ebi.ac.uk/asd/index.html EMBL-European Bioinformatics Institute Study of alternative splicing of the human gene Caspase-4 (cysteine protease) AltSplice -> Access -> Human -> Keyword (casp*) EnsEMBL gene ENSG00000137756
RNA-Seq Bioinformatics tools http://en.wikipedia.org/wiki/List_of_RNA-Seq_bioinformatics_tools
This study, along with the following discussion, details the association of thousands of ncRNAs— snoRNA, miRNA, siRNA, piRNA and long ncRNA—within human introns. We propose that such an association between human introns and ncRNAs has a pronounced synergistic effect with important implications for fine-tuning gene expression patterns across the entire genome.
It may be a “non-selfish” harmony between genes, introns, and ncRNAs • Genes provide space for introns inside them introns genes • Introns provide space for ncRNAs inside them symbiosis • ncRNAs provide expression regulation for genes Expression regulation ncRNA
HOMEWORK #1 Which intron is it? Does it contain functional elements? > INTRON gtatctctgtatctttatgttgtatcaaacacatgatatttcaca acaagctgaaaagtaggattatgggcaatgccattgtcag cttgttgggcgatatggcaacccactatataatcctctcttaa cagcattgggagtgttgtcaaaaggtttgacagacggttcg gagaactgttgctctaggaggagctgagagttcaagtctct ccatttcccaaaacttttttctcattcacgtggctggcttgtgtcc tgttccactttgaatatatggctaccccatttgctttcaactgat gtatgatagttttgtcgctttatttcatttttatatattacaatattac caatatctttgtcgttcaccag
Homework assignment #2 Why there is a difference in exon-intron structures of rat gaba-receptor gene from the paper and GenBank?