220 likes | 470 Views
Orthology & Paralogy (etc. etc.). Orthologs: Two genes, each from a different species , that descended from a single common ancestral gene. (note no regard to function!). Paralogs : Two or more genes, within the same species , that originated by one or more gene duplication events.
E N D
Orthology & Paralogy (etc. etc.) Orthologs: Two genes, each from a different species, that descended from a single common ancestral gene (note no regard to function!) Paralogs: Two or more genes, within the same species, that originated by one or more gene duplication events
Ancestral species Ancestral Gene 1 A B C D E A1 B1 C1 D1 E1 SPECIES TREE GENE TREE Clear case of orthology: each gene 1 in each species is an ortholog Of the others - all descended from a single common ancestor
Ancestral species Ancestral Gene 1 Gene duplication along this species branch A B C D E E1 A1 B1 C1 C2 D1 D2 SPECIES TREE GENE TREE Duplication event along branch to species C & D C1 and C2 are paralogs, D1 and D2 are paralogs What about A1 to C1? To C2?
Orthology & Paralogy (etc. etc.) Orthologs: Two genes, each from a different species, that descended from a single common ancestral gene (note no regard to function!) Paralogs: Two or more genes, within the same species, that originated by one or more gene duplication events Also now many subtle variants: Outparalogs: cross-species paralogs (i.e. gene duplication BEFORE speciation) Inparalogs: lineage-specific duplication (i.e. duplication AFTER speciation) Ohnolog: duplicates originating from a whole-genome duplication (WGD) Xenolog: genes related by horizontal gene transfer between species
Phenology vs. Phylogeny Phenology: tree based on similarity of characteristics Phylogeny: tree based on evolutionary history Align protein & score alignment (# of identical and ‘conserved’ amino acids) Build a tree based on sequence similarity Requires inferring history across the species A1 B1 C1 C2 A1 B1 C1 C2 A1 is more similar to C1 than C2 - A1 & C1 are likely (* but not guaranteed!) more similar functionally But historically, A1 is equally distant to C1 and C2
Methods of orthology prediction 1. Reciprocal best-BLAST hits (RBH): simplest method Species A Species B Gene A1 Gene B1 Gene A2 Gene B2 . . . . . . Gene An Gene Bn BLAST Gene A1 against Species B genome Take top BLAST hit in Species B and use as the query against Species A If Gene A1 is the top blast hit in the genome, then call A1 & B4 orthologs
Methods of orthology prediction 1. Reciprocal best-BLAST hits (RBH): simplest method Species A Species B Gene A1 Gene B1 Gene A2 Gene B2 . . . . . . Gene An Gene Bn BLAST Gene A1 against Species B genome Take top BLAST hit in Species B and use as the query against Species A If Gene A1 is the top blast hit in the genome, then call A1 & B4 orthologs
Problems with RBH * Clear cases where the top BLAST hit is NOT the ortholog e.g. top hits can be highly conserved common domains * Gene duplications in one species can completely obscure orthologous hits * Orthologs with very low sequence homology can be missed altogether
Methods of orthology prediction 2. Reciprocal Smallest Distance (RSD): slightly more complicated Species A Species B Gene A1 Gene B1 Gene A2 Gene B2 . . . . . . Gene An Gene Bn BLAST Gene A1 against Species B genome Take X number of top BLAST hits (user determined)
Methods of orthology prediction 2. Reciprocal Smallest Distance (RSD): slightly more complicated BLAST Gene A1 against Species B genome Take X number of top BLAST hits (user determined) Do a global multiple alignment - throw out proteins with <Y% gapped positions
Methods of orthology prediction 2. Reciprocal Smallest Distance (RSD): slightly more complicated BLAST Gene A1 against Species B genome Take X number of top BLAST hits (user determined) Do a global multiple alignment - throw out proteins with <Y% gapped positions Take remaining proteins and find the single one with the closest evolutionary distance
Methods of orthology prediction 2. Reciprocal Smallest Distance (RSD): slightly more complicated Species A Species B Gene A1 Gene B1 Gene A2 Gene B2 . . . . . . Gene An Gene Bn BLAST Gene A1 against Species B genome Take X number of top BLAST hits (user determined) Do a global multiple alignment - throw out proteins with <Y% gapped positions Take remaining proteins and find the single one with the closest evolutionary distance Final reciprocal BLAST using remaining gene in Species B as query against Genome A
Problems with RSD * Clear cases where the top BLAST hit is NOT the ortholog e.g. top hits can be highly conserved common domains * Gene duplications in one species can completely obscure orthologous hits * Orthologs with very low sequence homology can be missed altogether
Methods of orthology prediction 3. Newest methods take synteny into account Syntenic = conserved gene/sequence order Gene A1 A2 A3 A4 Gene B1 B2 B3 B4
Problems with Synteny-based Methods * Clear cases where the top BLAST hit is NOT the ortholog e.g. top hits can be highly conserved common domains * Gene duplications in one species less likely to obscure things * Orthologs with low sequence homology not part of a larger duplication could still be missed
Methods of orthology prediction 4. Clusters of Orthologs (COG) approach: - Addresses the restriction of 1:1 orthologs - Identifies inparalogs and then id’s orthologous relationships between groups Species A B C D Several approaches can assign COGs across many species at once (InParanoid, Fuzzy RB)
Why is orthology-paralogy so important? Allows us to study the history of protein evolution & infer constraints Ancestral Gene 1 Gene duplication along this species branch Separate gene duplication in Species A E1 A1 B1 C1 C2 D1 D2 A2 GENE TREE
Ligand Governs Glucocorticoid Receptor (GR) Mineralocorticoid Receptor (MR) Cortisol Stress Response Aldosterone (tetrapods) DOC (teleosts) Electrolyte Homeostasis * Teleosts don’t make aldosterone
Figure 1 Blue = Aldo binding Red = Cortisol ONLY
Two amino-acid changes in AncCR can alter specificity Blue = DOC Red = Cortisol Green = Aldo S106P likely occurred FIRST, then L111Q
Model for evolution of ligand binding & hormone response Ancestral protein could bind Aldo, even though no Aldo present Duplication ~450 mya = redundant receptors Two successive changes in GR = switch to Cortisol Specificity Emergence of Aldosterone Hormone