1 / 30

Distribution of Introns among Full Length cDNA

Bioinformatics Capstone . Distribution of Introns among Full Length cDNA. By Xin Hong Advisor: Dr. Michael Lynch and Dr. Sun Kim. Main Points. Motivation Background Data sources Method Results and discussion. Motivation. Genomic sequences Full length cDNA project

kilenya
Download Presentation

Distribution of Introns among Full Length cDNA

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bioinformatics Capstone Distribution of Introns among Full Length cDNA By Xin Hong Advisor: Dr. Michael Lynch and Dr. Sun Kim

  2. Main Points • Motivation • Background • Data sources • Method • Results and discussion

  3. Motivation • Genomic sequences • Full length cDNA project • Gene predict program does not include UTR regions. • The UTR structure and Function and NMD theory.

  4. AUG UAA Definition of UTRs and Introns • 5’UTR sequences were defined as the mRNA region spanning from the cap site to the starting codon (excluded). • 3’UTR sequences were defined as the mRNA region spanning from the stop codon (excluded) to poly(A) starting site. • The coding region begins with the initiation codon, which is normally ATG. It ends with one of three termination codons: TAA, TAG or TGA. Genomic sequence Pre-mRNA 1 2 3 mRNA 3UTR 5UTR CDS

  5. Function of UTRs • Translational control • mRNA sub cellular localization • mRNA stability Pesole, 2001

  6. Genomic sequence Pre-mRNA transcription 5’ 3’ Exon-Exon Junction (EEJ) Post transcriptional process 3’most EEJ NMD mRNA 50-55nt AUG UAA CDS 5’ UTR 3’ UTR Nonsense-Mediated Decay (NMD) • An mRNA is immune to NMD if translation terminates less than 50–55 nucleotides upstream or downstream of the 3′-most exon–exon junction, which is the last intron of cDNA. • NMD is a a mRNA surveillance mechanism that leads to selective degradation of transcripts containing premature termination codon.

  7. Objectives • To explore introns in the UTR region • To find the rule about introns distribution among UTR regions. • To compare the introns distribution between UTRs and CDS. • To compare the introns distribution rules among different species.

  8. Data source • Full length cDNA sequences • MGC (Mammalian Gene Collection): - mammalian • BDGP : – fruit fly • KOME : – plant • Genomic sequences • Genbank • Ensmbal • CDS prediction (Furuno et al. 2003) • ProCrest • rsCDS • NCBI predictor • DECODER • Experiment

  9. Do alignment between cDNA sequences and Genomic sequence How about gaps, overlapping even polymorphism? BLAST, Mega BLAST .. sim4, gap2, spidey, BLAT and GeneSeqer Method Jim Kent - the Blat Rap

  10. Steps • Clear full length cDNA and genomic sequence. • Parse cDNA to 5UTR, CDS and 3UTR three parts. • Aligning cDNA to genomic sequence by BLAT • Parse BLAT result to get locations of exon and intron. • Get sequences of exon and intron. • Check if sum of exons equal to cDNA to remove suspect candidates. • Calculate the average length of the cDNA, the average number of introns in cDNA, etc. • Compare the intron distribution of 5UTR, CDS and 3UTR regions. • Compare the intron distribution rules among different species.

  11. Flow Chart

  12. Objectives • To explore introns in the UTR region • To find the rule about introns distribution among UTR regions. • To compare the introns distribution between UTRs and CDS. • To compare the introns distribution rules among different species.

  13. Introns Do Exist in UTRs • Introns do exist in UTRs. • However, for arabidopsis as an example, 80% of sequences of 5’UTR don’t have introns. 90% of sequences of 3’UTR don’t have introns.

  14. Introns in CDS • 80% of sequences of CDS have introns.

  15. Introns number: UTRs vs. CDS • Most of CDS sequences have introns, but most of UTR sequences don’t have introns. Number of sequences Number of intron

  16. Objectives • To explore introns in the UTR region • To find the rule about introns distribution among UTR regions • To compare the introns distribution between UTRs and CDS • To compare the introns distribution rules among different species

  17. Introns in UTR • Introns of 5’UTR and 3’UTR are overspread, but not evenly or uniformly distributed. • If evenly distributed, the expected intron location = 1/(number of intron+1) Intron Number Number of intron

  18. Introns in UTR • The number of intron increase, when the length of sequence increase. • For human 5’UTR, on average an intron is present for each 100nt. • Introns of 3’UTR tend to concentrate toward the center of 3’UTR. Location of introns Length of sequences Number of intron Number of intron

  19. Objectives • To explore introns in the UTR region • To find the rule about introns distribution among UTR regions. • To compare the introns distribution between UTRs and CDS. • To compare the introns distribution rules among different species.

  20. Introns in CDS • Introns in CDS are overspread. • For human, if there are more than one intron, the interval between 2 introns is about 140nt. (In other word, the average exon in CDS is 140nt) • Introns are shift toward 5’.

  21. Intron distribution: UTRs vs. CDS Human as example: • The frequency of introns occurring 5’UTR is higher than that of CDS. • The frequency of introns occurring CDS is higher than that of 3’UTR. Number of intron Number of intron

  22. Intron distribution: UTRs vs. CDS

  23. Objectives • To explore introns in the UTR region • To find the rule about introns distribution among UTR regions. • To compare the introns distribution between UTRs and CDS. • To compare the introns distribution rules among different species.

  24. Different species: UTRs vs. CDS • Number of introns increase with the length of sequence in both UTRs and CDS. • The sequences of 5’UTR less than 100nt don’t have introns for human, mouse, rat, Arabidopsis and fruit fly. • While the sequences of CDS less than 800nt don’t have introns for human, mouse, Arabidopsis and fruit fly. For rat this boundary is 500nt. • The fruit fly sequence length increase faster than the other species in both UTRs and CDS. Number of intron Number of intron

  25. For 5 species, most of UTRs don’t have introns. For 5 species, most of CDS have introns. The intron distribution rule works for human, mouse, rat, arabidopsis and fruit fly. Different species: UTRs vs. CDS Number of sequences Number of sequences Number of intron Number of intron

  26. Summary • The introns do exist in UTRs. • The intron distributions in 5UTR, CDS and 3UTR are different for same organism. • The intron distribution rules are in common for human, mouse, rat, Arabidopsis and fruit fly. • The sequences of 5’UTR less than 100nt don’t have introns for human, mouse, rat, Arabidopsis and fruit fly. • While the sequences of CDS less than 800nt don’t have introns for human, mouse, Arabidopsis and fruit fly except for rat is 500nt. • The fruit fly fl-cDNA sequence length increase faster than the other species in both UTRs and CDS.

  27. Future work • NMD widely exists among different species. • The reason why most UTR don’t have introns. • The reason why intron frequency decrease when sequence goes from 5’ to 3’ along the full length cDNA.

  28. Reference Lynch, Micheal and Kewalramani, Avinash (2003) Messenger RNA Surveillance and the Evolutioary Proliferation of introns. Mol.Biol.Evol 20(40):563-571 Flavio Mignone, Carmela Gissi, Sabino Liunu and Graziano Pesole (2002) Untranslated regions of mRNAs. Genome Biology 3(3): revies 0004.1-0004.10 Pesole G, Grillo G, Larizza A, Liuni S.(2000) The untranslated regions of eukaryotic mRNAs: Structure, function, evolution and bioinformatics tools for their analysis. Briefing in Bioinformatics. 1(3):236-249 W.James (2002) Kent BLAT The BLAST-Like Alignment ToolGenome Res. Apr;12(4):656-64. Furuno M, Kasukawa T, Saito R, Adachi J, Suzuki H, Baldarelli R, Hayashizaki Y, Okazaki Y.(2003) CDS annotation in full-length cDNA sequence. Genome Res, Jun; 13(6B): 1478-1487 Strausberg RL et al. (2002) Generation and initial analysis of more than 15,000 full-length human and mouse cDNA sequences. Proc Natl Acad Sci U S A. 24;99(26):16899-903. http://www.ncbi.nlm.nih.gov

  29. Acknowledgement Dr. Micheal Lynch Dr. Sun Kim Dr. Douglas G. Scofield

  30. THE END

More Related