100 likes | 213 Views
Work Presentation Novel RNA genes in A. thaliana. Gaurav Moghe Oct, 2008-Nov, 2008. Source: Nature (Commentary on ENCODE. Starting databases. P utative U nique T ranscripts (PUTs) E xpressed S equence T ags (ESTs). ESTs vs PUTs.
E N D
Work Presentation Novel RNA genes in A. thaliana Gaurav Moghe Oct, 2008-Nov, 2008
Starting databases • Putative Unique Transcripts (PUTs) • Expressed Sequence Tags (ESTs)
ESTs vs PUTs • 42% of the total EST sequences in GenBank assembled into PUTs • 82% of the ESTs can be mapped to a unique genomic region vs 72% of the PUTs
Download PUT sequences ~324,000 Map them to the genome using GMAP 236,011 551 Yes? Map to AT RNA genes Map to protein-coding regions 3630 No? Map to other AT features BLASTn against all known CDS sequences + GeneWise to confirm alignment on translated CDS sequences 2023 No match? BLASTx against all known proteins to verify absence of any protein in the sequences 1849 No match? 1739 BLASTn against Repetitive Sequence Database 1453 No match? Coding Index to double-verify absence of protein-like seq 1260
Download PUT sequences ~324,000 Map them to the genome using GMAP 236,011 551 Yes? Map to AT RNA genes Map to protein-coding regions 3630 No? Map to other AT features BLASTn against all known CDS sequences + GeneWise to confirm alignment on translated CDS sequences 2023 No match? BLASTx against all known proteins to verify absence of any protein in the sequences 1849 No match? 1739 BLASTn against Repetitive Sequence Database 1453 No match? Coding Index to double-verify absence of protein-like seq 1260
Issues • PUT sequences of not very good quality Use sequence of the region on the genome where these PUTs map Use EST sequences? • BLAST against database does not give all hits BLAST against a different database, of a different size. • PUTs extremely close to genes may be part of extended UTR regions Remove ridiculously close ones. Check directions of other PUTs.
What if… • A sequence passes through all filters… but still is a protein sequence?
Issues • Most of these PUTs do not show conservation Does that mean they are non-functional? • Most of these PUTs do not seem to have a secondary structure like RNA Does that mean they are not RNA genes?
Plans for the next month • Get the final list of novel PUTs • Assign them directionality and estimate assembly error rates using EST mapping • Conservation • Secondary structure