1 / 7

Discussion Points for 2 nd Pseudogene Call

Discussion Points for 2 nd Pseudogene Call. Mark Gerstein 2005,09.22 11:00 EST. Intersection of Pseudogenes from Three Groups: Original. 42. 45. Havana-Gencode: 167 pseudogenes. 35. 21. 86. Yale: 184 pseudogenes. 87. 87. 18. 17. 18. 16. 22.

Download Presentation

Discussion Points for 2 nd Pseudogene Call

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Discussion Points for 2nd Pseudogene Call Mark Gerstein 2005,09.22 11:00 EST

  2. Intersection of Pseudogenes from Three Groups: Original 42 45 Havana-Gencode:167 pseudogenes 35 21 86 Yale: 184 pseudogenes 87 87 18 17 18 16 22 UCSC retrogenes: 15 expressed (7-8 pseudogenes) + 143 not expressed (all pseudogenes) 86 havana peudogenes overlap with any Yale pseudogene and 87 Yale pseudogenes overlap with any havana pseudogene (idem for retrogenes). This is a global result: maybe in some loci three havana pseudogenes overlap with only one yale pseudogene, but in other loci, several yale pseudogenes overlap with one havana pseudogene. Provided by France.

  3. Intersection of Pseudogenes from 4 Groups: Updated 52 (2) Havana-Gencode:167 pseudogenes 14 (2) 16 (0) Yale: 164 pseudogenes 82 (34) 15 (1) 17 (7) 33 (1) UCSC retrogenes: 146 not expressed • The numbers in parentheses are pseudogenes from GIS. • All from http://pseudogene.org/ENCODE/cross-ref • Pseudo-exons were merged to form pseudogenes and used for this comparison (now a pseudogene has only a single start and end) • Strand information is ignored • There are a total of 229 pseudogenes in the union

  4. Intersection of Pseudogenes from 4 Groups: Non-processed Consensus 52 (2) Havana-Gencode:167 pseudogenes 14 (2) 16 (0) 82 (34) Yale: 164 pseudogenes 15 (1) 17 (7) 33 (1) UCSC retrogenes: 146 not expressed Roughly agreement now is: 82 + 52 – 7 = 127 from 229 total What to do with 102?

  5. How to Pick Pseudogenes for RT-PCR? • Start with the intersection 127 • Duplicated v processed: how many of each? (2:1?) • Rank Pseudogenes: • By likelihood to be transcribed according to ENCODE evidence • ditag, then CAGE, then tiling array • By their uniqueness in genome • Good primers • Non cross-hybridizing probes • How to get a consistent rank? • Who will do RT-PCR ? • What coordinates to use ? • (Ignore 1 processed pseudogene already being sequenced by GIS group.)

  6. How to generate a consensus for remaining 102 pseudogenes? • Stick with the intersection 127 • Develop a consistent criteria for identifying pseudogenes and uniformly apply to ENCODE • E.g. protein matches with disablements found from a pipeline • Ignores tricky cases flagged by manual annotation • Do a simple union of UCSC, Havana & Yale, giving 229 • GIS is a subset of other 3 • Describe pseudogenes as being identified by multiple approaches and then explicitly flag each group’s unique ones in final annotation • Easy but perhaps biases stats • Do a qualified union • Allow each group to “question” particular pseudogenes in another’s set • Send questions around and then have a call to sort out differences • Need a way to arbitrate– e.g. we could demand an obvious disablement • We might learn something! • How do we represent this in the browser & in stats?

  7. Once we have consensus, how to agree on pseudogene boundaries? • Keep unchanged each group’s boundaries • If pseudogenes overlap, take largest region (union) or smallest • Develop a uniform criteria for assigning pseudogene boundaries and apply it to each of the pseudogenes in the consensus set • Could just take each pseudogene in the consensus and have one group realign it against parent

More Related