330 likes | 459 Views
Manually Adjusting Multiple Alignments. Chris Wilton. Multiple Alignments. Reviewing multiple alignments what is a multiple alignment? Analyzing a multiple alignment what makes a ‘good’ multiple alignment? what can it tell us, why is it useful? Adjusting a multiple alignment
E N D
Manually Adjusting Multiple Alignments Chris Wilton
Multiple Alignments • Reviewing multiple alignments • what is a multiple alignment? • Analyzing a multiple alignment • what makes a ‘good’ multiple alignment? • what can it tell us, why is it useful? • Adjusting a multiple alignment • Alignment editors and HowTo • Demonstration and practice
What is a Multiple Alignment? • A comparison of sequences • “multiple sequence alignment” • A comparison of equivalents: • Structurally equivalent positions • Functionally equivalent residues • Secondary structure elements • Hydrophobic regions, polar residues
A Good Multiple Alignment? • Difficult to define… • Good ones look pretty! • Aligned secondary structures • Strongly conserved residues / regions • Comparison with known structure helps • Bad ones look chaotic and random.
conservation quality consensus ☻ ? A Good Multiple Alignment?
Multiple Alignment Features • Barton (1993) • “The position of insertions and deletions suggests regions where surface loops exist…
Multiple Alignment Features • Barton (1993) • “The position of insertions and deletions suggests regions where surface loops exist… • Conserved glycine or proline suggests aβ-turn...
Multiple Alignment Features • Barton (1993) • “The position of insertions and deletions suggests regions where surface loops exist… • Conserved glycine or proline suggests aβ-turn… • Residues with hydrophobic properties conserved at i, i+2, i+4 (etc) separated by unconserved or hydrophilic residues suggests a surface β-strand…
Multiple Alignment Features • Barton (1993) • “The position of insertions and deletions suggests regions where surface loops exist… • Conserved glycine or proline suggests aβ-turn… • Residues with hydrophobic properties conserved at i, i+2, i+4 (etc) separated by unconserved or hydrophilic residues suggests a surfaceβ-strand… • A short run of hydrophobic amino acids (4 or 5 residues) suggests a buriedβ-strand…
Multiple Alignment Features • Barton (1993) • Pairs of conserved hydrophobic amino acids separated by pairs of unconserved or hydrophilic residues suggests anα-helix with one face packed in the protein core. Similarly, an i, i+3, i+4, i+7 pattern of conserved residues.”
Multiple Alignment Features • Barton (1993) • Pairs of conserved hydrophobic amino acids separated by pairs of unconserved or hydrophilic residues suggests anα-helix with one face packed in the protein core. Similarly, an i, i+3, i+4, i+7 pattern of conserved residues.” • Cysteine is a rare amino acid, and is often used in disulphide bonds ( pairs of conserved cysteines ) • Charged residues ( histidine, aspartate, glutamate, lysine, arginine ) and other polar residues embedded in a conserved region indicate functional importance
Quality Assessment • Bad residues • Large distance from column consensus • Bad columns • Average distance from consensus is high – “entropy” • Bad regions • Profile scores • Bad quality doesn’t always mean badly aligned! L I M I I L V E I V L A M P E R M K I D Q G Q N M W D L V T W D Y A A S L D F D N P G G A C R T T L I D R I N A I E V M A K L I Q
Quality Assessment • Profiles • A profile holds scores for each residue type (plus gaps) over every column of a multiple alignment • Concepts: • Consensus sequence • Amino acid similarity • Some multiple alignment programs use profiles to build or add to an alignment • Any alignment, or even one sequence, can be a profile (one sequence isn’t a very good one…)
What can we do with a MA? • Identify subgroups (phylogeny) • Intra-group sequence conservation • Evolutionary relatedness (view tree) • Identify motifs (functionality) • Evolutionary signals • Highly conserved residues indicate functional or structural significance! • Widen search for related proteins • MA better than single sequence • Consensus sequence / profile useful RPDDWHLHLR GGIDTHVHFI GFTLTHEHIC PFVEPHIHLD PKVELHVHLD
What do we want to do? • Build a homology model? • Accuracy • Perform phylogenetic analysis? • Completeness • Functional analysis of a protein family? • Diversity
Building the initial alignment • Fetch related sequences and run alignment • Clustal, Dialign, TCoffee, Muscle … • Fetch a multiple alignment from a database and add sequences of interest • Pfam, ProDom, ADDA … • Start from a motif-finding procedure • MEME, Pratt, Gibbs Sampler …
Adjusting the alignment • Filter alignment: • Remove any redundancy • Remove unrelated sequences • Remove unwanted domains • Recalculate alignment if necessary • Look for conserved motifs, adjust any misalignments. Try different colour schemes and thresholds. • One step at a time…
Jalview Alignment Editor Clamp, M., Cuff, J., Searle, S. M. and Barton, G. J. (2004), "The Jalview Java Alignment Editor", Bioinformatics, 20, 426-7.
HYDROPHOBIC / POLAR hydrophobic polar BURIED INDEX buried surface β-STRAND LIKELIHOOD probable unlikely HELIX LIKELIHOOD probable unlikely Colouring your alignment
Colouring your alignment • By conservation thresholds:
Colouring your alignment • Conservation index Amino Acid Property Classification Schema, eg: Livingstone & Barton 1993
Check PDB Structures • Load MA with sequence(s) for known PDB structure • View >> Feature Settings >> Fetch DAS Features (wait...) OR • Right-click >> Associate Structure with Sequence >> Discover PDB ids (quicker) • Right-click sequence name >> View PDB Entry • Structure opens in new window – residues acquire MA colours • Highlight residues by hovering mouse over alignment or structure • Label residues by clicking on structure
Compare Alignment to Structure • Crucial way of checking alignment! • Where are gaps / insertions /deletions ? • In secondary structures: bad • In surface loops: okay • Where are our key / functional residues? • Are they in probable active site? • Check they are clustered • Check they are accessible, not buried
Demonstration and Practice • Start Jalview (click here) • Tools >> Preferences >> Visual select Maximise Window, unselect Quality, set Font Size to 8 or 9, Colour >> Clustal, uncheck Open File Editing check Pad Gaps When Editing • File >> Input Alignment >> from URL (use this one) • Get used to the controls – selecting and deselecting sequences/groups (drag mouse), dragging sequences/groups (use shift/ctrl), selecting sequence regions, hiding sequences/groups, removing columns and regions… Then explore menus and tools. • Now load this alignment – I’ve messed up a good alignment, and now I’d like you to correct it! There are two groups of sequences and one single sequence to adjust.
Demonstration and Practice • View >> Feature Settings >> DAS Settings • select Uniprot, dssp, cath, Pfam, PDBsum_ligands, PDBsum_DNAbinding, then click ‘Save as default’ • click Fetch DAS Features (then click yes at prompt) ... • Move mouse over alignment and read information about features • Move mouse over sequence names to check for PDB ids • Open a PDB structure (choose any) • View >> uncheck Show All Chains, then use up-arrow key to increase structure size. • Hover mouse over structure (see how residues are highlighted in the sequence), then do same for sequence. Select residues in the structure by clicking them – a label will appear. Click again to remove label. • Check position of insertions & deletions using this method.