240 likes | 412 Views
BLAST program selection guide. http://www.ncbi.nlm.nih.gov/blast/producttable.shtml#tab31. Homology. Orthology, Paralogy, Xenology. Fitch WM. Trends Genet. 2000 May;16(5):227-31. . Analogy vs Homology. Analogy
E N D
BLAST program selection guide • http://www.ncbi.nlm.nih.gov/blast/producttable.shtml#tab31
Homology Orthology, Paralogy, Xenology
Analogy vs Homology Analogy The relationship of any two characters that have descended convergently from unrelated ancestors. Homology The relationship of any two characters that have descended, usually with divergence, from a common ancestral character.
Orthology The relationship of any two homologous characters whose common ancestor lies in the cenancestor of the taxa from which the two sequences were obtained. Paralogy The relationship of any two homologous characters arising from a duplication of the gene for that character. Xenology The relationship of any two homologous characters whose history, since their common ancestor, involves an interspecies (horizontal) transfer of the genetic material for at least one of those characters.
Test Yourself • A1 – B1 • A1 – B2 • A1 – C3 • B1 – C2 • C2 – C3 • B2 – C3 • C3 – AB1
Test Yourself • A1 – B1 = Ortho • A1 – B2 = Ortho • A1 – C3 = Ortho • B1 – C2 = Para (out) • C2 – C3 = Para (in) • B2 – C3 = Ortho • C3 – AB1= Xeno
Homology on a Genome-Scale • How many and which genes are common to two or more organisms? • Which genes differentiate one organism from another? • How is homology related to function?
Orthologs are the set of genes/proteins with gene trees identical to the species tree. • We can understand other types of homology relationships by comparison to the species tree. • But often we don’t know the species tree, and phylogenetic methods are complex
Consider two genomes • Use BLASTP to compare one set of proteins (proteome) to the other • Which set will you use as the query and which as the database? • What criteria will you use to define “a match”? GenomeA – gene 1 GenomeB– gene 1 A1, A3, B2 and B3 are homologs (assuming the aligned regions overlap) GenomeA – gene 2 GenomeB – gene 2 GenomeA – gene 3 GenomeB – gene 3
Reciprocal Best Hits • Use BLASTP to compare sets of proteins (proteome) to each other • First using GenomeA to query against GenomeB • Then using GenomeB to query against GenomeA • Save only one best match for each query • Save only the reciprocal best matches as “orthologs” GenomeA – gene 1 GenomeB– gene 1 GenomeA – gene 2 GenomeB – gene 2 GenomeA – gene 3 GenomeB – gene 3 GenomeA – gene 1 GenomeB– gene 1 GenomeA – gene 2 GenomeB – gene 2 GenomeA – gene 3 GenomeB – gene 3 Lose A3-B2 and A1-B3 homology GenomeA – gene 1 GenomeB– gene 1 GenomeA – gene 2 GenomeB – gene 2 GenomeA – gene 3 GenomeB – gene 3
GenomeA – gene 1 GenomeB– gene 1 GenomeA – gene 2 GenomeB – gene 2 GenomeA – gene 3 GenomeB – gene 3 One case where RBH works GenomeA – gene 1 GenomeB– gene 1 GenomeA – gene 2 GenomeB – gene 2 GenomeA – gene 3 GenomeB – gene 3 GenomeA – gene 1 GenomeB– gene 1 GenomeA – gene 2 GenomeB – gene 2 GenomeA – gene 3 GenomeB – gene 3 Glucose transport GenomeA – gene 1 Glucose transport GenomeB – gene 2 GenomeA – gene 3 Fructose transport Galactose transport GenomeB – gene 3
GenomeA – gene 1 GenomeB– gene 1 GenomeA – gene 2 GenomeB – gene 2 GenomeA – gene 3 GenomeB – gene 3 One case where RBH fails GenomeA – gene 1 GenomeB– gene 1 GenomeA – gene 2 GenomeB – gene 2 GenomeA – gene 3 GenomeB – gene 3 GenomeA – gene 1 GenomeB– gene 1 GenomeA – gene 2 GenomeB – gene 2 GenomeA – gene 3 GenomeB – gene 3 In paralogs- duplication since speciation Glucose transport GenomeA – gene 1 Glucose transport GenomeA– gene 3 GenomeB– gene 2 Fructose transport Galactose transport GenomeB – gene 3
Software/Methods for Predicting Orthologs from Genome Sequences • RBH • RSD (Reciprocal Shortest Distance) • INPARANOID • RIO • Orthostrapper • Ortholuge • TribeMCL • OrthoMCL
Li L, Stoeckert CJ Jr, Roos DS. OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res. 2003 Sep;13(9):2178-89.
Pre-computed OrthoMCL results http://www.orthomcl.org/
Evaluating performance • No “gold standard” set of true orthologs • Latent Class Analysis • Agreement between methods provides confidence • 27,562 proteins from 6 eukarotes assigned to Pfams
actual \ predicted negative positive Negative TN FP Positive FN TP Performance Metrics • Accuracy – Proportion correct • TN+TP/total • TPR (Recall) – Proportion of predicted positives that are correct • TP/FP+TP • Sensitivity – Proportion of positives correctly predicted • TP/FN+TP • Specificity – Proportion of negatives correctly predicted • TN/TN+FP
Chen F, Mackey AJ, Vermunt JK, Roos DS. Assessing performance of orthology detection strategies applied to eukaryotic genomes. PLoS ONE. 2007 Apr 18;2(4):e383.
Is context useful for assigning homology type? • Prokaryotes vs eukaryotes • Evolutionary origin • Paralogs that arise as tandem repeats of single genes • Parlogs that arise from duplication of larger regions • Xenologs that arise from acquisition of a similar gene from another lineage
Example: pectate lyases of soft-rot enterobactia may be SymBets, but genome context suggests they may not be orthologs