461 likes | 1.86k Views
Properties of pseudogenes ( G). Genomic DNA sequences Homology to known genes Non-functional copies No natural selection pressure Disablements: frameshifts & stop codons Small Indels Inserted repeats (LINE/Alu). Duplicated pseudogenes. Original Gene. Gene Duplication. Mutations.
E N D
Properties of pseudogenes (G) • Genomic DNA sequences • Homology to known genes • Non-functional copies • No natural selection pressure • Disablements: frameshifts & stop codons • Small Indels • Inserted repeats (LINE/Alu)
Duplicated pseudogenes Original Gene Gene Duplication Mutations Pseudogene • retains intron/exon structure • e.g. globins, Hox cluster
Processed pseudogenes (retroelements) Original Gene AAAAAA LINE-11 mediated retrotransposition AAAAAA Pseudogene AACATA • Mostly dead-on-arrival • Intronless, poly-A tail, short direct repeats
Why study pseudogenes? • Contamination in sequence databases • Abundant: ~20k pseudogenes in human • ~8K processed, many ribosomal • 80 human ribosomal protein genes by experiments, few hundred in ENSEMBL • Interfere with study on functional genes • Cross-hybridization in microarray experiments • Generate false positives in gene prediction • Pseudogenes are “genomic fossils” • Study the evolution of genes and genomes • Measure mutation/insertion rates
Human cytochrome C gene and pseudogenes Cyc MGDVEKGKKIFIMKCSQCHTVEKGGK-HKTGPNLHG-LFG-RKTGQAPGYSYTAAN cyc --DVEKGKKIFIQKCVQWHTMEKEGK/HETGLNLHG/LLG/RKTGQVIGFSYTDSN Cyc KNKGIIWGEDTLMEYLENPKKYIPGTKMIFVG-IKKKEERADLIAY-LKKATNE cyc KNKGIT*GEDTLKEYLENLKKYIPGTK**YFL/VTKKAERADLITYL\EKATNE
Raw Genome Sequence Search for gene homology Protein Databases Pipeline for pseudogene assignment Remove overlap with known annotations Gene Annotation Classify into 3 categories: processed, duplicated and fragments Post to UCSC and pseudogene.org
Schematic Flowchart of Pseudogene Identification Gene Protein BLASTN TBLASTN …GCTATTTNNNGGGCCAATTATGCG… ENCODE regions with RepeatMasker BLAST Hits Link Hits FASTA / GeneWise Further processing: disablements, classification, other features
Yale 19 Yale 76 29 Vega 17 111 Vega 42 6 2 ENSEMBL 6 23 ENSEMBL 2 9 13 Regions All 44 Regions Overlap between 3 pseudogene annotation sets
Different criteria for pseudogenes: Features for assignment vs those surveyed Used in assignment Used for survey Not used
ENCODE analysis strategy • Identified genome-wide processed pseudogenes • Added duplicated pseudogenes for ENCODE • Performed detailed analysis on chr22 • Interrelated different gene & pseudogene annotation with tiling array data • Will adopt similar approach for ENCODE
Genes Chr22 PCR tiling array, probe expression in 3 cell lines MAS oligonucleotide arrays Affymetrix tiling arrays expression in 11 cell lines EST CpG islands Integration of different transcription data for genes and pseudogenes on chr22 Pseudogenes Transcription factors binding sites from ChIP-Chip (CREB, NFkB, p53 etc.) Sequence conservation in rat, mouse and chimp
Exon PCR Tile (+) EST CpG NFkB Intersection of Exons with Transcribed Microarray Tiles and other Transcription Evidence A B C
G PCR Tile (+) EST CpG NFkB Intersection of yG with Transcribed Microarray Tiles and other Transcription Evidence A B C
A B C G PCR Tile (+), 3 cell lines Affymetrix microarray EST PCR Tile (-), 3 cell lines Intersection of yG with Transcribed Microarray Tiles and other Transcription Evidence B: Located within Cat Eye Syndrome Critical Region
Gene Gene Pseudogenes in Human-Mouse Synteny Human • Less than one half of the human processed genes have a homologue gene in the mouse syntenic regions • ~ 60% of the human processed G were created after human/mouse divergence (~ 75Myr ago) Mouse
The Human yG HCP9 is relic of a primordial gene, which still functions in mouse
Reasons why pseudogene annotations differ • Pseudogene prediction is a low priority relative to gene annotation • Done by only a few disparate groups • No standard detection/assignment method • False negatives result from discounting repeat regions
Questions for discussion • Keep separate annotation tracks or merge? • Establish ontology & identification criteria? • How do pseudogenes confound gene annotation and prediction • particularly acute for duplicated pseudogenes • Might be worth devoting more attention to cataloging pseudogenes as a group • Alleviate ambiguities in transcript mapping • Identify prediction false positives
Examples of Functional Pseudogenes • In snail L. stagnalis, expression of Nitric Oxide Synthase (nNOS) is suppressed by an antisense RNA transcribed from an NOS pseudogene [Korneev, Park, O’shea, J. Neuroscience, 1999] • In mouse, a pseudogene regulates expression of Makorin1 gene by binding to a transcriptional repressor or an RNA-digesting enzyme [Hirotsune et al. Nature423 2003] Ancestral NOS gene NOS gene NOS Normal mouse NOS RNA NOS RNA Formation of RNA duplex and suppression of protein production from NOS Transgenic mouse
Pseudogene Challenges: Confidence Values • Pseudogenes cannot be verified experimentally • How changing computational parameters affect our level of confidence? • Can we quantify confidence? Accept result with 90% match to existing gene Result Set 1 Fragment Set Filter Accept result with 70% match to existing gene Result Set 2