Bioinformatics

Bioinformatics Ayesha M.Khan Spring 2013

Phylogenetic Basics • One central field in biology is to infer the relation between species. Do they possess a common ancestor? When did they separate from each other? • Phylogenetics is the study of evolutionary relationships among and within species. • Phylogenetics is the field of systematics that focuses on evolutionary relationships between organismsor genes/proteins (phylogeny). • Systematics: an attempt to understand the interrelationships of living things

Phylogenetic Basics (contd.) • The actual pattern of evolutionary history is the • phylogeny or evolutionary tree which we try to estimate. • • A tree is a mathematical structure which is used to model the actual evolutionary history of a group of sequences or organisms.

Phylogenetic Basics (contd.) • Homologues are similar sequences in two different organisms that have been derived from a common ancestor sequence. • • Orthologuesare similar sequences in two different organisms that have arisen due to a speciation event. Orthologs typically retain identical or similar functionality throughout evolution. • • Paraloguesare similar sequences within a single organism that have arisen due to a gene duplication event. They tend to have differing functions. • • Xenologuesare similar sequences that do not share the same evolutionary origin, but rather have arisen out of horizontal transfer events through symbiosis, viruses, etc.

Early globin gene Gene Duplication -chain gene ß-chain gene mouse  human  cattle  cattle ß human ß mouse ß Orthologs () Orthologs (ß) Paralogs (cattle) Homologs Orthologs – diverged after speciation – tend to have similar function Paralogs – diverged after gene duplication – some functional divergence occurs For linking similar genes between species, or performing “annotation transfer”, identify orthologs

Molecular phylogenetics Why focus on molecular phylogenies in contrast to phylogenies which are based on characteristics like wings, feathers, etc, i.e. morphological characters? With molecular phylogenetics, the differences between organisms are measured on the proteins and RNA coded in the DNA, i.e. on amino acid and nucleotide sequences.

Molecular phylogenetics (contd.) • Also, molecular phylogenetics is more precise than its counterpart based on external features and behavior and can also distinguish small organism like bacteria or even viruses. • the DNA must be inherited and connects all species • the molecular phylogenetics can be based on mathematical and statistical methods and is even model-based as mutations can be modeled, remote homologies can be detected • the distance is not only based on one feature but on many genes.

Molecular Phylogeny Analysis Molecular phylogeny methods allow, from a given set of aligned sequences, the suggestion of phylogenetic trees (inferred trees) whichaim at reconstructing the history of successive divergence which took place during the evolution, between the considered sequences and their common ancestor. These trees may not be the same as the true tree. • Reconstruction of phylogenetic trees is a statistical problem, and a reconstructed tree is an estimate of a true tree with a given topology and given branch length; • In practice, phylogenetic analyses usually generate phylogenetictrees with accurate parts and imprecise parts.

Key features of molecular phylogenetic trees

Molecular Phylogeny Analysis (contd.) • Sequences reflect relationships • After working with sequences for a while, one develops an intuitive understanding that for a given gene, closely related organisms have similar sequences and more distantly related organisms have more dissimilar sequences. These differences can be quantified. • Given a set of gene sequences, it should be possible to reconstruct the evolutionary relationships among genes and among organisms.

Example:Pseudomonas aeruginosa- one of the top three causes of opportunistic infections, noted for its antimicrobial resistance and resistance to detergents.

Phylogenetic tree construction

Choose set of related sequences Obtain multiple alignment Is there a strong similarity? Yes No No Maximum parsimony (strong) Distance methods (weak) Maximum likelihood (very weak)

Phylogenetic tree construction methods Three categories of methods exist: distance-based, maximum parsimony, and maximum likelihood. • Distance methods: evolutionary distances are computed for all sequences and build tree where distance between sequences “matches” these distances • •Maximum parsimony (MP): choose tree that minimizes number of changes required to explain data • •Maximum likelihood (ML): Creates all possible trees containing the set of organisms considered and then find the tree which gives the highest likelihood of the observed data

Comparison of different tree-construction methods

Case Study I : Phylogenetic Trees Get a multiple sequence alignment C1 C2 C3 S1 A A G S2 A AA S3 G G A S4 A G A Construct a Tree using any suitable method (Parsimony, ML, etc..)

Evaluation • For example, how confident are we that two sequences are in the same clade ? • What is the probability distribution of our confidence of the branches ? • Bootstrap can provide a way of determining this (first thought of by Felsenstein, 1985)

Bootstrap: basic idea • Originally, from some list of data, one computes an object. • Create an artificial list by randomly drawing elements from that list. • -Some elements will be picked more than once. • Compute a new object. • Repeat 100-1000 times and look at the distribution of these objects.

Original object O (a tree) is computed from a “list of data” (sequences) • Construct a new list, with the same number of elements, from the original list by randomly picking elements from the list. Any one element from the list can be picked any number of times. • Compute new object, call it On • Repeat the process many times (typically 100-1000). • The elements {O1 ,O2 , ……} are assumed to be taken from a statistical distribution, so one can compute averages, variances, etc.

A model for the bootstrap • Basically, we are calculating the proportion of bootstrap trees agreeing with the original tree. • ‘Agreeing’ refers to the topology of the trees The numbers at the branches are confidence values based on Felsenstein’s bootstrap method. B=200 bootstrap replications

Bioinformatics

Bioinformatics

Presentation Transcript

Bioinformatics

Bioinformatics:

Bioinformatics

Bioinformatics

Bioinformatics

Bioinformatics

Bioinformatics

Bioinformatics

Bioinformatics

Bioinformatics

Bioinformatics

Bioinformatics

Bioinformatics

Bioinformatics

Bioinformatics

Bioinformatics

Bioinformatics

Bioinformatics

Bioinformatics

Bioinformatics

Bioinformatics