1 / 12

ENCODE Pseudogene Annotation Subgroup: Summary of Thurs. 16-Sept Call

ENCODE Pseudogene Annotation Subgroup: Summary of Thurs. 16-Sept Call. summarized by M Gerstein 16-Sept Participating groups Havana, IMIM, UCSC, Yale, GIS, Affy. Overall Goals of Pseudogene Subgroup. Create consensus ENCODE pseudogene annotation Agree on defining elements of a pseudogene

cathy
Download Presentation

ENCODE Pseudogene Annotation Subgroup: Summary of Thurs. 16-Sept Call

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ENCODE Pseudogene Annotation Subgroup:Summary of Thurs. 16-Sept Call summarized by M Gerstein16-Sept Participating groups Havana, IMIM, UCSC, Yale, GIS, Affy

  2. Overall Goals of Pseudogene Subgroup • Create consensus ENCODE pseudogene annotation • Agree on defining elements of a pseudogene • What is the degree to which pseudogenes confound gene annotation? How many are close or distal to genes? • Cross-reference this annotation against ENCODE experiments • How many pseudogenes have some functional "activity"? How many are transcribed ? • How many are associated with TARs & transfrags? CAGE & ditags? ChIP-chip binding sites ? • Cross hybridization problem

  3. Intersection of Pseudogenes from 3 Groups 42 45 Havana-Gencode:167 pseudogenes 35 21 86 Yale: 184 pseudogenes 87 87 18 17 18 16 22 UCSC retrogenes: 15 expressed (7-8 pseudogenes) + 143 not expressed (all pseudogenes) 86 havana peudogenes overlap with any Yale pseudogene and 87 Yale pseudogenes overlap with any havana pseudogene (idem for retrogenes). This is a global result: maybe in some loci three havana pseudogenes overlap with only one yale pseudogene, but in other loci, several yale pseudogenes overlap with one havana pseudogene. [ Provided by France Denoeud (IMIM) ]

  4. 48 49 30 15 87 87 87 17 18 16 11 29 >15 ENm002 831244 831480 237 IPI:IPI00442001 259 ..337 pexons: 1 235FHALVVLSWPHVLELLPQRNPSLHVASLTRQLQHCMAGHQLLQFKGSTLALVIITLELERLMPGWCAPISDLLKKAQV FHALVVLSWPHVLELLPQRNPSLHVASLTRQLQHCMAGHQLLQFKGSTLALVIITLELERLMPGWCAPISDLLKKAQV FHALVVLSWPHVLELLPQRNPSLHVASLTRQLQHCMAGHQLLQFKGSTLALVIITLELERLMPGWCAPISDLLKKAQVFHALVVLSWPHVLELLPQRNPSLHVASLTRQLQHCMAGHQLLQFKGSTLALVIITLELERLMPGWCAPISDLLKKAQV "Yale-only" Pseudogenes:5 Examples No disablement, overlap exon >70 ENm007 381109 381518 410 IPI:IPI00448927 239 ..330 frameshift=1 ENm007 381109 381518 pexons: -404 -147SKKPSLSVQPGPVMAPGESLTLHCVSDVGYDRFVLYKEGERDLRQLPGRQPQAGLSQANFTLGPVSRSYGGQYRCYGAHNLSSECSAPSDP SPQPSLSAQPGSPVLSGDSLTPQHHSEAGFDSSALTR-----TR!LPARQRLDGQHLLDVPLGHASHPPGGQHRCCGGHNASCPRSVPRRP PGVSKKPSLSVQPGPVMAPGESLTLHCVSDVGYDRFVL-YKEGERDLRQLPGRQPQAGLSQANFTLGPVSRSYGGQYRCYGAHNLSSECSAPG-SPQPSLSAQPGSPVLSGDSLTPQHHSEAGFDSSAL/YQD-----KGLPARQRLDGQHLLDVPLGHASHPPGGQHRCCGGHNASCPRSV PSDPLDILITGQIRGT-----PFISVQPG PRRPHPTSWL-QVRGPYPDPIPFSALDPG Frameshift >122 ENm009 367441 368389 949 IPI:IPI00465221 1 ..305 ENm009 367441 368389 pexons: -949 -37MALPITNGTLFMPFVLTFIGIPGFESVQCWIGIPFCATYVIALI.........WILYPIICTYHLVQSLPTGPTIPQPLYLWVKDQTH MALPITNGTLFMPFVLTFIGIPGFESVQCWIGIPFCATYVIALI.........WILYPIICTYHLVQSLPTGPTIPQPLYLWVKDQTH MALPITNGTLFMPFVLTFIGIPGFESVQCWIGIPFCATYVIALI.........WILYPIICTYHLVQSLPTGPTIPQPLYLWVKDQTHMALPITNGTLFMPFVLTFIGIPGFESVQCWIGIPFCATYVIALI.........WILYPIICTYHLVQSLPTGPTIPQPLYLWVKDQTH No disablement, overlap exon Remove 12, but some tricky issues-- i.e. 12,99,152,169,108 >205 ENr223 185680 201963 16284 IPI:IPI00023543 110 ..588 2.78 intron=4 stop=2 frameshift=5 pexons: 3383 3620 3826 4462 12565 12865 12917 13099 13459 13551 Disablements, have introns, probable duplicated, overlap exon >177 ENr122 359278 362468 3191 IPI:IPI00029222 980 ..1118 0.87 intron=0 stop=0 frameshift=2 ENr122 359278 362468 2768 3191 pexons: 2768 3191LGNTIQDIGMGKDFMTKTPKAMATKVKIDRWDLIKLKSFCTAKETTIRVNRQPTKWEKIFAIYSSDKGLISRIYNE---LKQIYKKKTNNPIKKWAKDMNRHPSKEDIYAAKKHMKKCSSSLAIREMQIKTTMRYHLTPVR LGNNILDTGFGKYFMTKMPKAIATETKIEIWDISKLK!FCRAKETINSVNRQPIEMEKIFANYASDRGLISRIY!KKTNLNLQAKTKQHNSIKKWPKDMDRHFSKDDICVANKPRKTLPTSLIIREIQIKTMMRYHLTPFR IKTLEKNLGNTIQDIGMGKDFMTKTPKAMATKVKIDRWDLIKLK-SFCTAKETTIRVNRQPTKWEKIFAIYSSDKGLISRIY---NELKQIYKKKT-NNPIKKWAKDMNRHPSKEDIYAAKKHMKKCSSSLAIREMQIKTTMRYHLTPVRVRLLYALLGNNILDTGFGKYFMTKMPKAIATETKIEIWDISKLK/SFCRAKETINSVNRQPIEMEKIFANYASDRGLISRIY*KKNKLKFTSKNQT\NNSIKKWPKDMDRHFSKDDICVANKPRKTLPTSLIIREIQIKTMMRYHLTPFR Multiple Frameshifts, overlap exon

  5. 48 49 30 15 87 87 87 17 18 16 11 29 >12 ENm002 242882 243044 163 IPI:IPI00017094 2359 ..2399 0.26 ENm002 242882 243044 FARASKEQKDKFLKNRGFSLLANQLYLHRGTQELLECFIE FSRPSKKQKDKFLK-YSFSLLANQLFLHQEIQELTDSFIK LDAYFARASKEQKDKFLKNRGFSLLANQLYLHRGTQELLECFI-EMFFGRHIGLDEFEA*FSRPSKKQKDKFLK-YSFSLLANQLFLHQEIQELTDSFI/EMFFG*CTGLDE "Havana-only" Pseudogenes:5 Examples >56 ENm006 1293946 1313338 19393 IPI:IPI00384823 1 ..1276 8.3 intron=0 stop=7 frameshift=9 >125 ENm009 424525 425472 948 IPI:IPI00022766 1 ..282 MYIVAVAGNIFLIFLIMTERSLHEPLYLFLSMLASANFLLAAAAAPEVLAILWFH.........KQIKDRVILLFSPISVCC MYIVAVAGNIFLIFLIMTERSLHEPMYLFLSMLASADFLLATAAAPKVLAILWFH.........KQIKDRVILLFSPISVCC MYIVAVAGNIFLIFLIMTERSLHEPLYLFLSMLASANFLLAAAAAPEVLAILWFH.........KQIKDRVILLFSPISVCCMYIVAVAGNIFLIFLIMTERSLHEPMYLFLSMLASADFLLATAAAPKVLAILWFH.........KQIKDRVILLFSPISVCC Similar discussion for "UCSC only" >103 ENm008 153121 155155 2035 IPI:IPI00217473 8 ..143 intron=2 stop=0 frameshift=0 pexons: 25 96 1360 1566 1906 2033TIIVSMWAKISTQADTIGTETLE LFLSHPQTKTYFPHFDLHPGSAQLRAHGSKVVAAVGDAVKSI TIIVSMWAKISTQADTIGTETLE R:R[agg] LFLSHPQTKTYFPHFDLHPGSAQLRAHGSKVVAAVGDAVKSI DDIGGALSKLSELHAYILRVDPVNFK LLSHCLLVTLAARFPADFTAEAHAAWDKFLSVVSSVLTEKYR DDIGGALSKLSELHAYILRVDPVNFK LLSHCLLVTLAARFPADFTAEAHAAWAKFLSVVSSVLTEKYR RLFLSHPQTKTYFPHFDLHPGSAQLRAHGSKVVAAVGDAVKSIDDIGGALSKLSELHAYILRVDPVNFKLRLFLSHPQTKTYFPHFDLHPGSAQLRAHGSKVVAAVGDAVKSIDDIGGALSKLSELHAYILRVDPVNFKV >174 ENr121 322430 366341 43912 IPI:IPI00384823 1 ..1276 intron=0 stop=2 frameshift=2 ENr121 322430 366341 38663 42482 pexons: 38663 42482MTGSNSHITILTLNINGLNSAIKRHRRASWIKSQDPSVCCIQET...

  6. Pseudogenes Overlapping Gencode Exons 122 28 30 124 Havana-Gencode:167 pseudogenes Yale: 184 pseudogenes 13 12 20 2 Havana-Gencode Exons: 17603

  7. 49 GIS Pseudogenes, Not Yet Fully Compared • The 49 non-redundant ENCODE processed pseudogenes were used • for comparison with pseudogenes from Yale, Vega, and Ensembl groups. • 4 pseudogenes were uniquely found in the two libraries. GIS-PET (4) Yale (12) Vega (5) 20 Ensembl (3) 2 2 1 [From GIS]

  8. Browser Tracks [R Baertsch, UCSC] Pseudogene track A processed pseudogene at chr21: 33775699 -33776428 genome-test.cse.ucsc.edu/ENCODE/encode.html

  9. Overall short-term goal for next call:Come up with a consensus list of pseudogenes suitable for carefully checking for transcription (perhaps by RT-PCR)

  10. Immediate ToDo's for Next Call • Classify pseudogenes as processed & non-processed (with a third "not sure" category) • Venn diagrams in each category • Need to add to our current 87 consensus • Among duplicated pseudogenes:Determine Yale/Havana consensus, add to 87 • Among processed pseudogenes: • Merge in 49 from GIS • Each group should determine which of its pseudogenes not in the consensus it still wants to keep and repost them to list • Update list summary and UCSC browser • list summary web page (maintained by Deyou, http://homes.gersteinlab.org/people/zhengdy/cgi-bin/encode-pgene.cgi ) • Flag truly tricky ones as questionable to be returned to later (e.g. #169, OR ex. truncated at 6TM )

  11. Browser ToDo's for Next Call • Send alignments to Rob so he can link to browser • A clear coloring scheme for differentiating processed vs non-processed pgenes • UCSC will index by names used by the different groups • Create an additional fourth sub-track for consensus pseudogenes • Perhaps an additional track for prominent disagreements i.e. questionable pseudogenes (or another color) • Small fix on Gencode "pseudogene" track

  12. Remaining Issues • Of the consensus pseudogenes, determine unique sequences for RT-pcr or matching against probes • Remaining questions: • How are we going to arrive at agreed upon boundaries for pseudogenes (start and stop)? • What is the best for alignments, cDNA or protein? • (Given that complete cDNA info is not available for everything, perhaps best to stick to proteins initially.)

More Related