940 likes | 1.06k Views
Orthology predictions for whole mammalian genomes. Leo Goodstadt MRC Functional Genomics Unit Oxford University. Mammalian Genomes. How does our genome, and how do our genes, differ from those of other mammals and other vertebrates?. Great Expectations.
E N D
Orthology predictions for whole mammalian genomes Leo Goodstadt MRC Functional Genomics Unit Oxford University
Mammalian Genomes How does our genome, and how do our genes, differ from those of other mammals and other vertebrates?
We did not appreciate how much functional sequence there would be. We did not appreciate how hard it would be to ‘read off’ functions from the human genome. We had no idea that individual human genomes can differ so much! So why is it taking so long to understand a simple genome • How much? • Species-specific genes? • Human genomes
How do we find function in the genome? • Nothing in Biology Makes Sense Except in the Light of Evolution. Theodosius Dobzhansky (1900-1975).
The dawn of mammalian comparative genomics Sanity checks for all mammalian projects:Lessons from the mouse genome (2002)
Mouse-Human Orthologues % Identity • sites not in domains: 64.4% • cSNP sites: 67.1% • all sites: 70.1% • sites in domains: 88.9% • disease sites: 90.3%
Large number of lineage specific duplications 10 – 20% of genes are lineage specific depending on comparisons
20% of human genes have been duplicated or do not have a rodent orthologue Family trees for genes: Human specific genes missing from mouse.(In many cases, more distantly related mouse gene (homologues) can be found. (8%) 1 to 1 (80%) Gene families shared with mouse but which have expanded in human (9%) Shared Orthologues (present as a single gene in the common ancestor to human and mouse)
Where do new genes come from? • De novo (from non-coding) • Rapid sequence change • Gene duplication M. Lynch and A. Force , The probability of duplicate gene preservation by subfunctionalisation. Genetics 154 (2000), pp. 459–473 y • Pseudogenisation • Missing: Horizontal Gene transfer
Inparalogues Chemosensation(OR, V1R and V2R ) Reproduction(Vomeronasal Receptors, lipocalins, b-microseminoprotein (12:1)) Immunity(IG chains, butyrophilins, leukocyte IG-like receptors, T-cell receptor chains and carcinoembryonic antigen-related cell adhesion molecules )pancreatic RNAses Detoxification(hypoxanthine phosphoribosyltransferase homologues nitrogen poor diets) KRAB ZnFingers
No. in cluster Reproduction Clusters
Rapid evolvers in protein coding genes Reproduction Chemosensation KRAB Zn Fingers Immunity TOXIN DEGRADATION
Hypothesis: Darwinian evolution Competition: • Inter-specific (pathogens, predators) • Intra-specific • mating • sub-speciation / kin-selection • gender conflict • clonal expansions in sperm
KRAB-zinc finger genes Cancer-testis antigen genes (e.g. PRAMEs) Regulate chromatin structure and therefore the timing of transcription. Rapidly-changing developmental or transcriptional regulatory genes?
Detecting biological signals among inparalogues Correlations with known annotations • Biological Annotations (gene descriptions / Gene Ontology) • Tissue specificity • Comparative changes across lineages (dating) • Chromosomal Distribution • Positive selection • Genomic environment
Different genes duplicate at different times LeoGoodstadt et al. Genome Res. 2007; 17: 969-981
Trends - Functions Human - Chicken GCSC (2005)
Trends - Tissues Chicken - Human CGSC (2005)
Exploring rapid evolutionary with protein structure GENE FAMILIES
Positive selection: PRAME genes Amino acid sites under positive selection in human (red), mouse (blue) and rat (purple) [or multiple species (yellow)] PRAME genes.
Gene Duplication Remodels Genome Androgen-binding proteins. produced by sertoli cells in testes seminiferous tubulesEmes et al. (2004) Genome Res. 14(8):1516-29
Lipocalins:Mouse Major Urinary ProteinsRat 2u-globulin genes sites subject to positive selection
VR2 olfactory receptor N-terminal domain: sites: dark blue, ligand (glutamate) pink (other monomer)
MHC class1b, M10s sites : in blue, peptide ligand in MHC structure in green
Finding disease candidates within model organisms ORTHOLOGY AND DISEASE
Few Mendelian disease genes lack mouse orthologues • Kallmann syndrome geneC. elegans orthologue. • CETP - cholesteryl ester transfer proteinRabbit and Hamster • Glycophorin EPrimate specificMN and Ss blood types
Mouse equivalents of human disease variants Hs normal: MAETLFWTPLLVVLLAGLGDTEAQQTTLHPLVGRVFVHTLDHETFLSLPEHVAVPPAVHI Hs variant: MAETLFWTPLLVVLLAGLGDTEAQQTTLHLLVGRVFVHTLDHETFLSLPEHVAVPPAVHI Mm normal: MAAAVTWIPLLAGLLAGLRDTKAQQTTLHLLVGRVFVHPLEHATFLRLPEHVAVPPTVRL Nick Dickens & Jörg Schultz
Disease mutations do not always lead to pathological phenotypes in mouse! 7293 SwissProt disease-associated variants • 90.3% mouse residue = human wild-type residue • 7.5% mouse residue ≠ human wild-type residue • 2.2%mouse residue = human disease residue
Genomes are not a bag of genes GENOMIC CONTEXT IS IMPORTANT:LESSONS FROM THE MONODELPHIS
Comparisons with a third genome • Australian marsupial silver-gray bushtail possumTrichosurus vulpecula • 8,237 orthologues from 111,634 ESTs • More closely related to Monodelphis Median dS:
Homo Monodelphis 1:1 orthologues / d d N S 0.086 1.02 d S Amino acid sequence identity 81.0% 94.2% Pairwise alignment coverage Homo sapiens Monodelphis domestica Number of exons 9 9 Sequence length (codons) 471 445 Unspliced transcript length (bp) 27 , 241 25 , 365 G+C content at 4D sites 56.9% 48.7%
Higher G+C in Monodelphis X Increased G+C