270 likes | 429 Views
Orthology & Paralogy Alignment & Assembly. Alastair Kerr Ph.D. [many slides borrowed from various sources]. Overview. Orthology & Paralogy Definitions and examples Ways to determine an ortholog Pre-calculations: resources Alignment & Assembly Differences Key programs for each
E N D
Orthology & ParalogyAlignment & Assembly Alastair Kerr Ph.D. [many slides borrowed from various sources]
Overview • Orthology & Paralogy • Definitions and examples • Ways to determine an ortholog • Pre-calculations: resources • Alignment & Assembly • Differences • Key programs for each • Jalview example
Homologs Have common origins but may or may not have common activity. Homologous or not?: Often determined by arbitrary threshold level of similarity determined by alignment
Homologs …have common ancestry, but the way they are related can vary (i.e. the reasons they have diverged into different sequences can vary) • orthologs - Homologs produced by speciation. They tend to have similar function. • paralogs - Homologs produced by gene duplication. They tend to have differing functions.
Early globin gene Orthologous or paralogous homologs Gene Duplication -chain gene ß-chain gene mouse human cattle cattle ß human ß mouse ß Orthologs () Orthologs (ß) Paralogs (cattle) Homologs Orthologs – diverged after speciation – tend to have similar function Paralogs – diverged after gene duplication – some functional divergence occurs Therefore, for linking similar genes between species, or performing “annotation transfer”, identify orthologs
True or False? A1x is the ortholog in species x of A1y? A1x is a paralog of A2x? A1x is a paralog of A2y?
Identifying Gene/Protein Relationships from Phylogenetic trees • orthologs - Homologs produced by speciation. Gene phylogeny matches organismal phylogeny. • paralogs - Homologs produced by gene duplication. Multiple copies of homologs in a given species or evidence that gene duplication involved through phylogenetic analysis and lack of match to organismal phylogeny
Gene Orthology: How to detect? • Most : Identify reciprocal best BLAST hits (EGO, COGs,…) Example Problem: • If making comparisons between human and bovine, for example, the bovine gene dataset is still quite incomplete • Therefore, current best hit may be a paralog now and the true ortholog not yet sequenced mouse human cattle cattle
2 Forms in 1 Species + + ++ + Slides from Jonathan Eisen
2 Forms in 1 Species - Gene Loss + + ++ + Loss Loss Gene duplicated in common ancestor ++
Unusual Distribution - Gene Loss + + Gene lost here Gene present in ancestor
Unusual Distribution -Evolutionary Rate Variation -? Gene too diverged to be found + +
Ortholog guess via synteny A B C A ? C
Alignments and Assemblies • Alignment • ALL sequences from SAME region • Therefore can be useless for a • non-overlapping contigs • PCR probes/oligos • Good for • paralog/orthologs • Basis for phylogeny • Assembly: • Good for near identical sequences • Types: • De-novo • Guided [reference sequence]
Alignment • Implicit statement • Each residue in an aligned sequence derived from the last common ancestor [LCA] • Therefore ok to only look at conserved regions or mask non-conserved regions • Especially for phylogeny
Alignment Tools • Faster but less accurate (some better with gaps) • Muscle • ClustalW/X • MAFFT • Slow but more accurate • *-Coffee • T: original • 3D: uses pdb as guide (structural) • M: uses multiple methods • Probcons
Alignment Edit Tools • NEVER use a word processor or excel to edit alignments…… • JalView (Java Alignment Viewer) • Good for editing • DAS capable
Multiple Sequence Alignment Consensus Conservation & Clustering PDB Secondary Structure Prediction ‘Standard’ Formats FASTA MSF CLUSTAL PILEUP BLC PFAM Distributed Annotation System GFF Clickable HTML Jalview Features Images Jalview Annotation Line Art Newick Analysis Structures Sequences Visualization Alignments Features Annotation Figure Generation Trees
Select specific sources • Filtered list • Add user defined sources • Group features by source • Type==colour • Highlight start-end Jalview DAS Client Functionality DAS ANNOTATION SERVERS • Query matches ID to Authority • Map to local reference frame • Mouse over for feature name, links and scores
Assemblers • Many free options • STADEN - staden.sf.net • Original assembler, all platforms • No longer in development • Useless for next gen sequencing • MAQ and MAQView • Installed in computers in COIL