1 / 22

Gencode Mar '10 Meeting: Pseudogene Project Update Mark Gerstein

Gencode Mar '10 Meeting: Pseudogene Project Update Mark Gerstein. Illustration from Gerstein & Zheng (2006). Sci Am. Overall Approach Overall Pipeline runs at Yale and UCSC, yielding raw pseudogenes Extraction of coherent subsets for further analysis and annotation

almira
Download Presentation

Gencode Mar '10 Meeting: Pseudogene Project Update Mark Gerstein

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Gencode Mar '10 Meeting: Pseudogene Project UpdateMark Gerstein Illustration from Gerstein & Zheng (2006). Sci Am.

  2. Overall Approach Overall Pipeline runs at Yale and UCSC, yielding raw pseudogenes Extraction of coherent subsets for further analysis and annotation Passing to Sanger for detailed manual analysis and curation Incorporation into final GENCODE annotation Pipeline modification Chronology of Sets Encode Pilot 1% Ribosomal Protein pseudogenes Unitary pseudogenes (Hard) Glycolytic Pseudogenes Polymorphic Pseudogenes Pseudogenes Associated with SDs Overall Flow:Pipeline Runs, Coherent Sets, Annotation, Transfer to Sanger

  3. Specific Pseudogene Assignments: Glycolytic Pseudogenes (completed)

  4. Number of pseudogenes for each glycolytic enzyme [Liu et al. BMC Genomics ('09)] Large numbers of processed GAPDH pseudogenes in mammals comprise one of the biggest families but numbers not obviously correlated with mRNA abundance. GAPDH Processed/Duplicated GAPDH

  5. Number of pseudogenes for each glycolytic enzyme [Liu et al. BMC Genomics ('09)] Large numbers of processed GAPDH pseudogenes in mammals comprise one of the biggest families but numbers not obviously correlated with mRNA abundance. GAPDH Processed/Duplicated GAPDH 60 Proc/2 Dup

  6. Distribution of human GAPDH pseudogenes Large numbers of processed GAPDH pseudogenes in mammals comprise one of the biggest families but numbers not obviously correlated with mRNA abundance. 60 Proc/2 Dup [Liu et al. BMC Genomics ('09, in press)]

  7. Burst of Retrotran-spositional Activity Aproximate Age of GAPDH pseudogenes Age calculated based on Kimura-2 parameter model of nucleotide substitution [Liu et al. BMC Genomics ('09)]

  8. Synteny of GAPDH pseudogenes Synteny derived based on local gene orthology [Liu et al. BMC Genomics ('09)]

  9. Specific Pseudogene Assignments: Unitary Pseudogenes (completed)

  10. Pseudogenes { Unitary pseudogene • Pseudogenes: nongenic DNA segments with high sequence similarity to functional genes Duplicated pseudogenes Duplication + Processed pseudogenes Transcription Transposition • Unitary pseudogenes: unprocessed pseudogenes with no functional counterparts Unitary pseudogenes In situ pseudogenization

  11. Identification pipeline { Unitary pseudogene ~16k human-mouse orthologs ~23k mouse proteins ~6k mouse proteins without human orthologs HG ~600 candidate human unitary pseudogene loci 76 human unitary pseudogenes [Zhang et al. GenomeBiology (in press, '10)]

  12. Relativity of unitary pseudogenes { Unitary pseudogene [Zhang et al. GenomeBiology (in press, '10)]

  13. Unitary Pseudogene Families

  14. Dating the pseudogenization events { Unitary pseudogene

  15. Specific Pseudogene Assignments: Polymophic Pseudogenes (in process)

  16. 11 Polymorphic Pseudogenes

  17. Polymorphic pseudogenes (3 with allele frequency data) [Zhang et al. GenomeBiology (in press, '10)] 3 SNPs not found to be under recent positive selection....

  18. Fst hierarchical clustering for rs4940595 in SERPINB11 ....but population structure at rs4940595—the difference in the allelic frequencies in different populations—could be result of different selective regimes that the same allele at rs4940595 is subjected to in different population subdivisions.

  19. Specific Pseudogene Assignments: SD-associated Pseudogenes (in process)

  20. Segmental duplications (SDs) • Regions of the genome with  90% sequence identity and  1kb in length • Based on neutral divergence correspond to last ~40 million years of human evolution • Comprise ~5-6% of the human genome • Enriched with genes (~18%) and pseudogenes (duplicated ~45%, processed ~22%) Can the study of ψgenes in SDs provide information not obvious from individual dataset ? Bailey et al, Science, 2002

  21. Nucleotide substitutions in ψgenes and SDs containing them Parent gene Duplicated ψgene K2m : Nucleotide substitutions per site computed using Kimura’s two parameter model • Most ψgenes show the same number of substitutions as larger SD region containing them • - Duplication accompanied by disablement • - Followed by neutral rate of evolution

  22. Acknowledgements Z Zhang E Khurana Y J Liu YK LamS Balasubramanian G Fang N Carriero R RobilottoP Cayting M Wilson A Frankish M Diekhans R HarteT HubbardJ Harrow Pseudogene.org

More Related