430 likes | 805 Views
Applied Bioinformatics. Week 7 Jens Allmer. Practice I. Homework Feedback. Review Rough Writing Guidelines Word Template. Topic. Multiple Sequence Alignment Review Building an MSA Editing an MSA Dendrograms Phylogenetic Trees. Choosing Sequences. How many?
E N D
Applied Bioinformatics Week 7 Jens Allmer
Homework Feedback • Review Rough Writing Guidelines • Word Template
Topic • Multiple Sequence Alignment Review • Building an MSA • Editing an MSA • Dendrograms • Phylogenetic Trees
Choosing Sequences • How many? • 10 – 15 (less than 50 would be good) • Seqs should be >30% and <90% identical • Prefer seqs of similar length • Prefer seqs without internal repeats or extract them
Choosing Sequences • While choosing your sequences give them good names • Some sequences should be well annotated
Create an MSA • This time use 20 – 50 sequences • From different species • Use ClustalW for alignment • Most ClustalW servers display a dendrogram • Confirm this by using a few of them
Gathering Sequences • Download the sequences as a FASTA file as well • Most programs will support this format
Output Formats • Many different formats • FASTA widely supported • Pdf Only for printing/ storing/ sharing • Pir Similar to fasta • Msf common MSA format • Aln subset of msf
Converting Formats • http://bioweb.pasteur.fr/seqanal/interfaces/fmtseq.html • Names (>…) no longer than 15 characters • Different formats maintain different data • Converting will introduce the problem of loosing data • Make sure to have a master copy
Editing Alignments • http://www.jalview.org • Start the program • Choose File – Input Alignment – from Textbox • Copy and paste the ClustalW alignment
Dendrogram • Jalview also allows you to view different types of Dendrograms based on different similarity measures • Use Jalview and compare the trees that are constructed based on the different measures
End Practice I • 15 min break
Phylogeny • Sources • Sequences • Clades • Organims • Why • Understand evolution • Strain diversity • Epidemiology • Gene predicion
Dendrogram http://en.wikipedia.org/wiki/Dendrogram
Tree Terminology • All circled elements (e.g.: a)are called node(s) • The connections between them are called edge(s) or branch(es) • The first node that forms the tree is called root (here abcdef) • Terminal nodes that have only one connection are called leaf(ves) (e.g.: a) Unrooted Trees (remove red root)
Branch Length • Arbitrary • Similarity • Evolutionary Time
Tree types • A dendrogram is a broad term for the diagrammatic representation of a phylogenetic tree. • A cladogram is a tree formed using cladistic methods. This type of tree only represents a branching pattern, i.e., its branch lengths do not represent time. • A phylogram is a phylogenetic tree that explicitly represents number of character changes through its branch lengths. • A chronogram is a phylogenetic tree that explicitly represents evolutionary time through its branch lengths.
Sequences • DNA • Sensitive but quite divergent at longer distances • Use for very closely related organisms • cDNA • Still sensitve but less divergent (e.g. introns) • Use for closely related families • Protein • Least sensitive but most useful for more distant relationships • Use for distantly related species • 16S RNA • Exists in all organisms • Highly conserved
Overall Process • Get Sequences • Construct MSA • Compute pairwise distances (for some methods) • Build Tree • Topology • Branch Lengths • Estimate accuracy, reliability • Build several different trees for that • Visualize the tree
Computational Tree Formation • Distance Methods • Neighbor-Joining • Least-Squares • UPGMA • Parsimony • Least number of evolutionary steps • Maximum Likelihood • Highest probable tree to fit to the hypothesis is constructed
Neighbor Joining • Bottom-up clustering method • Create distance map • Join closest nodes • Do (1-2) until fully joined http://en.wikipedia.org/wiki/Neighbor_joining
Least Squares • Standard approximation approach • Minimizes the sum of the error (squares) • Example PGLS • Phylogenetic Generalized Least Squares • Needs additional data (traits) http://www.dynamicgeometry.com/General_Resources/Advanced_Sketch_Gallery/Other_Explorations/Statistics_Collection/Least_Squares.html
UPGMA • Unweighted Pair Group Method with Arithmetic Mean • Aglomerative hierarchial clustering method • Assumes constant rate of evolution
Similarity Measures • Sequence • Number of different positions • Weighted differences • Substitution Matrices • Pairwise alignments • NW, SW, .. • Additional measurements or knowlege • Traits • Parsimony • Number of changes for tree paths
Tree Accuracy • Bootstrapping • Resample • Recompute • Do many times • Compare results http://www.sciencedirect.com/science/article/pii/S0191814107000156
http://goergen.deviantart.com/art/Magic-Forrest-Wallpaper-139108299http://goergen.deviantart.com/art/Magic-Forrest-Wallpaper-139108299
End Theory I • Relax • Mindmap • Break
Where to get Trees • Most servers that allow for MSA will also provide at least the guide tree which was used to construct the alignment • If that’s all you are interested in you don’t need to go any further
Edit your MSA • Remove blocks consisting of mostly gaps (using JalView) • Remove N- and C-termini if not conserved well
Easy Tree • www.ebi.ac.uk/clustalw/ • Paste your alignment • Select a tree type • Other options need to be set (see right) • Press run • Make a screen shot • You can paste it where needed
Phylip (More elaborate tree) • http://bioweb.pasteur.fr/seqanal/phylogeny/phylip-uk.html • Choose protdist from the page • Paste the MSA • Bootstrapping e.g.:
Phylip • Run the query • Click further analysis
Click Run Select full screen view There is your tree
Ugly Tree • Let’s face it the tree is quite ugly • http://iubio.bio.indiana.edu/treeapp/treeprint-form.html • Select the consense.outtree from the previous website and paste it into the box • Select submit to create the tree • Play around with the formats and settings
Other Resources • http://en.wikipedia.org/wiki/List_of_phylogenetics_software • http://itol.embl.de/