390 likes | 576 Views
So you think you can model the genome…. Jeltje van Baren – April 27 2005. Overview. Introduction: there’s more to the genome than genes The formation of repetitive elements: LINES Genome rearrangements and evolution: The Hox genes Genes without function: pseudogenes
E N D
So you think you can model the genome… Jeltje van Baren – April 27 2005
Overview • Introduction: there’s more to the genome than genes • The formation of repetitive elements: LINES • Genome rearrangements and evolution: The Hox genes • Genes without function: pseudogenes • Gene prediction and pseudogenes • Processed pseudogenes and gene prediction • Ways to get rid of pseudogenes in gene predictions • intron alignment method • conservation method
There’s more to the genome… • 1-3% of the human genome is coding • 50% of homology between mouse and human is coding • Rest…? – regulatory elements? Shh C7orf2
LCR There’s more to the genome… Beta globin cluster: the locus control region Wijgerde et al. 1995. Nature 377:209-213 Insulators affect transcription factor ability
Junkyard 1-3% of the human genome is coding – is the rest ‘junk’?
Junkyard 1-3% of the human genome is coding – is the rest ‘junk’? A lot of it is repeat: 17% LINES 15% SINES 8% Retrovirus/retroposon 3% DNA transposon
Repetitive elements - LINES LINE stands for long interspersed element pol II ORF1 ORF2 AAAAA Internal promoter ? Reverse transcriptase/Endonuclease Transcription Translation Migrates back into nucleus Ribonucleoprotein (RNP)
Rearrangement shapes the genome In the course of evolution, genomes are constantly ‘shuffled’: Duplication of regions by unequal crossing over during meiosis
Example of duplication: the Hox cluster 1 2 3 4 5 6 7 (8) 9 10 11 (12) 13 Mouse HoxA HoxA2 homeobox The homeobox codes for a 60 amino acid protein domain that is a DNA binding motif DNA binding motifs are often present in transcription factors
Example of duplication: the Hox cluster In evolution, first duplication of genes, then duplication of region...? In Fugu, HoxC1 and HoxC3 are pseudogenes
Hox genes are important in development Mouse HoxA proximal distal 1 2 3 4 5 6 7 (8) 9 10 11 (12) 13 Mouse developing limb
Hox genes are important in development Mouse HoxD anterior posterior 1 (2) 3 4 (5) (6) (7) 8 9 10 11 12 13 Mouse developing limb
Pseudogenes in Hox cluster 1 (2) 3 4 5 6 (7) 8 9 10 11 12 13 Fugu HoxC Pseudogenes • How do we detect pseudogenes? • Mutations lead to putative frameshifts or stop codons • No transcripts of the gene can be detected • Some gene features (promoter, intron/exon boundaries, termination signal) may have been lost • Ka/Ks ratios
Pseudogenes and gene prediction Segmental duplications may contain complete genes Mutations result in deterioration or generation of gene family members
Prediction of pseudogenes Pseudogene will be predicted as real gene if no stop codons or intron/exon boundary mutations are present In longer genes (more exons), stop-containing exons may be skipped
Repetitive elements - LINES LINE stands for long interspersed element pol II ORF1 ORF2 AAAAA Internal promoter ? Reverse transcriptase/Endonuclease Transcription Translation Migrates back into nucleus Ribonucleoprotein (RNP)
SINES pol III AAAAAAAAAA No ORF! SINES (eg Alu) use the RT/EN function of LINES Migrates back into nucleus Ribonucleoprotein (RNP)
Migrates back into nucleus Ribonucleoprotein (RNP) Processed pseudogenes Processed pseudogenes are generated from intronless RNA using the same mechanism:
Pseudogenes and gene prediction Pseudogene treated as single exon gene Pseudogene treated as exons
Finding nonprocessed pgenes • How can’t we do it? • Use frameshifts or stop codons in exon predictions • Use polyA tails next to exon prediction • Things we can do: • Identify parent gene • Look for non-conservation
Using known genes for pgene finding Predicted gene BLAST Known gene Align prediction to genomic region of known gene & match intron locations If the intron positions do not line up, the exon is a putative pseudogene
Limitations – intron method • Only works if the parent gene is known • Will not detect small parent exons • Some ‘known genes’ are really undetected pseudogenes
Finding processed pseudogenes Method 2: conserved synteny
What is synteny? • Synteny the occurrence of two or more genes on the same chromosome within one species • -Conserved Synteny • The occurrence of synteny of orthologous genes in two different organisms. human chr7 mouse chr5 conserved synteny
Conserved synteny human mouse
Using conserved synteny in pseudogene finding • Take gene model. • BLAST to human genes. • Compare gene location. • If there is a better hit elsewhere in the human genome than in the mouse conserved syntenic region: possible pseudogene human mouse ?
Example: pseudogene exons Exon not conserved in mouse
Example: pseudogene exons Parent gene orthologous with different mouse chromosome: hit in mouse… Solution: remove all ‘second’ and ‘third’ orthology hits
FBN3 Fibrillin3 has no mouse homolog but is a real gene in human
It all started with transgene insertion… Sex-lethal (Drosophila) S.Hirotsune et al., Nature 243:91-96 (2003)
…that resulted in really unhappy mice ~80% of +/- mice die within 2 days of birth the rest has bone deformities, renal and liver problems and an incomplete epithelial eye cover at birth.
So what happens? Mkrn1 Mkrn1-p Competition for a ‘destabilizing factor’?