290 likes | 424 Views
ENCODE pseudogene updates. Adam Frankish, HAVANA 13/10/05. Not added - AK125808. Reverse strand mRNAs. Translation. Ral-GDS related protein Rgr (Rgr) pseudogene.
E N D
ENCODE pseudogene updates Adam Frankish, HAVANA 13/10/05
Not added - AK125808 Reverse strand mRNAs Translation Ral-GDS related protein Rgr (Rgr) pseudogene The transcripts on which this pseudogene is based do not appear to have a valid translation (only BC007286.1 has a translation which looks spurious)
Not added - YalePgene_139 I have been able to reconstruct a coding gene with a full length CDS at this locus (AC009892.1) and would not annotate a coding gene and pseudogene at the same locus as discussed previously. The majority of the gene (3' end of exon 3 to final exon (8)) is supported by 100% matching (best in genome hits) human EST (Em:DN998408.1, Em:BG743947.1) and mRNA evidence (Em:BC033195.1) which together support a structure (although there is a small gap in support in exon 5) with an ORF extending from start to the final exon. Using human ESTs not from this locus eg Em:BM918119.1 (approx 70% ID at this locus best hit in genome 100% to the KIR2DL4 gene also on chr19 by ensembl SSAHA) the 5' end of exon 3 and two further upstream exons can be clearly identified (all splice sites are clearly intact). The structure contains a CDS which starts in exon 1 (shares homology with the N-terminal sequence of several KIR2D family members in the exon), ends in the final exon and contains three immunoglobulin domains. The fact that despite the lack of transcript evidence from the 5’ end locus and the quite high degree of divergence between this locus and other gene family members, these splice sites are preserved suggests that this structure is correct and a coding gene rather than a pseudogene.
Not added - YalePgene_139 Protein EST mRNA Supporting evidence
Not added - YalePgene_139 Dot plot of EST Splice donor
Havana+, Yale-, UCSC- AC006326.4-001 AC006326.2-001 AC063976.2-001 AF277315.12-001 RP11-143H17.1-001 AC009892.5-001 Z84721.2-001 Z84721.4-001 AC103710.2-001 AC103710.4-001 AC129505.5-001 AC087380.10-001 AC087380.14-001 AC002456.1-001 AC009404.5-001 AC114812.7-001 AC011330.5-001 AC011330.8-001 AL162151.3-001 We think the annotation of these as pseduogenes can be supported
ENm001 - AC006326.2, AC006326.4 UCSC pseudo Yale pseudo NADH dehydrogenase 2 (MTND2) pseudogene heterogeneous nuclear ribonucleoprotein A1 (Hnrpa1) pseudogene NADH dehydrogenase 4 (MTND4) pseudogene New cytochrome b (CYTB) pseudogene
ENm002 - AC063976.2 Dot plot Alignment
ENm004 - RP1-127L4.3 UCSC pseudo HAVANA pseudo Yale pseudo
ENm006 - AF277315.12 olfactory receptor family pseudogene
ENm006 - RP11-143H17.1 HAVANA pseudo Frameshift
ENm007 - AC009892.5 HAVANA LIR pseudogene
ENm008 - Z84721.4 HAVANA hemoglobin, alpha pseudogene
ENm009 - AC103710.2 olfactory receptor, family 51, subfamily N, member 1 pseudogene Frameshift
ENm009 - AC103710.4 olfactory receptor, family 52, subfamily Y, member 1 pseudogene
ENm009 - AC129505.5 olfactory receptor, family 52, subfamily Z, member 1 pseudogene No Met First possible Met
ENm009 - AC087380.10 olfactory receptor, family 51, subfamily A, member 10 pseudogene Frameshift
ENm009 - AC087380.14 Novel pseudogene
ENm013 - AC002456.1 ribosomal protein L5 (RPL5) pseudogene
ENr121 - AC009404.5 5-hydroxytryptamine (serotonin) receptor 5B (HTR5B) pseudogene Frameshift
ENr131 - AC114812.7 UDP glycosyltransferase 1 family, polypeptide A2 pseudogene Frameshift
ENr233 - AC011330.5 Novel pseudogene 3’ truncation ~350aa missing, no stop
ENr233 - AC011330.8 Stop codon in exon 20 stereocilin (STRC) pseudogene
ENr322 - AL162151.3 mRNA dot plot pseudogene similar to part of ribosomal protein L3 (RPL3) Protein dot plot
HAVANA pseudogene overlaps exon • Non-coding locus • AC008984.4, AC008984.6, AC009892.8, AC006293.1, AC114812.6, AC114812.5, AC005538.2, AC018512.3, RP3-477O4.5 • Coding locus opposite strand • AC002543.2, RP11-143H17.1, AC010492.4, RP11-398K22.9, RP3-477O4.4 • Coding locus same strand • AC008984.5, Z84721.2, AC011330.5 We believe all these pseudogenes are valid
Non-coding locus Aligned proteins (column collapsed) HAVANA sialyltransferase pseudogene Supporting EST Putative novel transcript
Coding locus opposite strand Protein alignment Non-coding exon HAVANA novel pseudogene ENm001 Pseudogene: AC002543.2
Coding locus same strand LILR pseudogene Frameshift LILRA3
But not…. In-frame stop codon KIR2DL3 – coding gene