150 likes | 261 Views
Which ORF?. Jeltje – September 7 2005. Where to start?. gatgtc atg cgatgttattg M R C Y g atg tcatgcgatgttattg M S C D V I gatgtcatgcg atg ttattg M L L. Eukaryotes. …A/GNN AUG G……. Methylated cap. small ribosomal subunit. Eukaryotes. …A/GNN AUG G…….
E N D
Which ORF? Jeltje – September 7 2005
Where to start? gatgtcatgcgatgttattg M R C Y gatgtcatgcgatgttattg M S C D V I gatgtcatgcgatgttattg M L L
Eukaryotes …A/GNNAUGG…… Methylated cap small ribosomal subunit
Eukaryotes …A/GNNAUGG…… Methylated cap
Eukaryotes …A/GNNAUGG……
Eukaryotes …A/GNNAUGG…… Large ribosomal subunit
Eukaryotes …A/GNNAUGG…… M
…CNNAUGTGCGTTAUGG…… Leaky scanning HIC …CNNAUGTGCGTTAUGG…… …CNNAUGTGCGTTAUGG……
Skipping AUG In some cases translation is initialized but terminated upon encounter of the second AUG Internal Ribosome Entry Site (IRES): not sequence specific viral (only?)
MGC genes Tested 1000 MGC genes (Skipped genes with same ORF) Looked at longest ORF, first ORF, and longest first ORF (picked longest from three frames). ORFs must be >5 aa Compared to ‘called’ ORF in GenBank
MGC genes • Of 1000 genes • For 887, the first large ORF is the largest ORF • Of those, only 388 have the A/GNNATGG consensus • MGC ORFs: • 845 are the same as first/largest ORF • 35 are a subset of the first/largest (all skip first M) • 6 pick another orf (1 notfound )
MGC genes • Of 1000 genes (the remaining 113) • In 102 cases, the annotated ORF is the longest, not the first • In 3 cases, the annotated ORF is a subset of the longest ORF • In 6 cases, the annotated ORF is the first, not the longest • 1 annotated ORF cannot be found • 1 annotated ORF is neither the first nor the longest
Examples: GenBank ORF is first >longest MSLSLVFRAASYFKLVPFHSSSSNQFLQPPGWVVLTQTLVLLHFERFSYQNVPKSAQGKGNLQPETNIHLFHFLTFPKQISRNLFNSLLCLMCLTYF >first MTNVYSLDGILVFGLLFVCTCAYFKKVPRLKTWLLSEKKGVWGVFYKAAVIGTRLHAAVAIACVVMAFYVLFIK (Longest not found in mouse)
GenBank neither first nor longest >longest MESDPRICTMGNQEWPGWVPPPGPASSPPNCPHPMDEAGGTFGAKPACLPAPCLTRASFQLALPPAGPWAWPGPTGGYGLGSPSPLRGWRATSLGCYNLTPDSIGPLPLPRAPRSAALRLNMSARPCQCCGTPVRASDCVCRRDAGTRGCVCMCVCVRAACPPVCMVCGLGPHPWPEHFILWGRGADLVGGAPL >first MGGGRAPPERLGGCR >GBprot MRCLSSKKAGSTSVVKYIKTWRPRYFLLKSDGSFIGYKERPEAPDQTLPPLNNFSVAECQLMKTERPRPNTFVIRCLQWTTVIERTFHVDSPDEREEWMRAIQMVANSLQPHLCAQTRIWKTPPPAQAWAVGRLEIQVLIHTSPSEG
GenBank ORF is subset of longest >longest MSKRRMSVGQQTWALLCKNCLKKWRMKRQTLLEWLFSFLLVLFLYLFFSNLHQVHDTPQMSSMDLGRVDSFNDTNYVIAFAPESKTTQEIMNKVASAPFLKGRTIMGWPDEKSMDELDLNYSIDAVRVIFTDTFSYHLKFSWGHRIPMMKEHRDHSAHCQAVNEKMKCEGSEFWEKGFVAFQAAINAAIIEIATNHSVMEQLMSVTGVHMKILPFVAQGGVATDFFIFFCIISFSTFIYYVSVNVTQERQYITSLMTMMGLRESAFW >first MGSSLQELSQKMENEKTDLVGMALFISSGTVSVPIFLQFTSSS >GBprot MGWPDEKSMDELDLNYSIDAVRVIFTDTFSYHLKFSWGHRIPMMKEHRDHSAHCQAVNEKMKCEGSEFWEKGFVAFQAAINAAIIEIATNHSVMEQLMSVTGVHMKILPFVAQGGVATDFFIFFCIISFSTFIYYVSVNVTQERQYITSLMTMMGLRESAFW (Longest found in mouse)