270 likes | 843 Views
GENE TREES. Abhita Chugh. Phylogenetic tree. Evolutionary tree showing the relationship among various entities that are believed to have a common ancestor. Species tree. A phylogenetic tree showing the relationship among various species that are believed to have a common ancestor.
E N D
GENE TREES Abhita Chugh
Phylogenetic tree Evolutionary tree showing the relationship among various entities that are believed to have a common ancestor
Species tree • A phylogenetic tree showing the relationship among various species that are believed to have a common ancestor
Species tree Speciation Nodes Shows the evolutionary history of a set of species
Gene tree • A phylogenetic tree that depicts how a single gene has evolved in a group of related species • For this talk, evolve = duplication or loss • Can be constructed over the topology of a species tree
Gene tree Speciation Nodes Duplication nodes Shows the evolutionary history of a single gene
Some definitions: Homologs • Homolog: A gene related to a second gene by descent from a common ancestral DNA sequence • Two types: (i) Orthologs (ii) Paralogs
Orthologs • Genes in different species that evolved from a common ancestral gene by speciation - Retain the same function Speciation Primates Human Chimp
Paralogs • Genes related by duplication within a genome • Evolve new functions Primates Rodents Chimp Human Rat Mouse
Why are Gene Trees interesting? • Determine the evolutionary history of a gene family • Infer gene duplications and losses • Estimate bounds on times these events occurred • Determine whether a given pair of homologs is orthologous or paralogous
Gene tree can be constructed over a species tree topology PRIMATES INTELLIGENCE
Gene Tree Reconstruction • Problem: Given a set of sequences from a gene family, find the tree that best explains the data • 2 models: • Micro-evolutionary: considers sequence evolution only • Macro-evolutionary: considers duplication and losses only; useful but rarely used
Reconstruction algorithm • Only macroevolutionary events are considered • i – number of gene copies a node inherits from its parent • j – number of gene copies a node sends to its children • Range from 1 to m, where m is the maximum multiplicity of the gene in any species
Reconstruction algorithm • The entering number of genes in root should be one • For each node, v, the dynamic program calculates the minimum D/L Score of the subtree rooted at v, for all possible values of i and j
Step 1: Annotates minimum cost tables for all nodes • cost [ i, j ] = cost at a node if it inherits i genes and sends j genes • cost [ i ] = minimum cost at a node if it inherits i genes = minimum { cost [ i, j ] }, for all j
cost[1, 1] = 0 + 1 + 1 = 2 cost[1, 2] = 1 + 0 + 1 = 2 cost[1] = 2 cost[1, 1] = 0 + 0 + 1 = 1 cost[1, 2] = 1 + 1 + 0 = 2 cost[1] = 1 cost[2, 1] = 1 + 0 + 1 = 2 cost[2] = 1 cost[1] = 1 cost[2, 2] = 0 + 1 + 0 = 1 cost[2] = 0 cost[1] = 0 cost[1] = 1 cost[2] = 1 cost[2] = 0 Cost of an internal node = cost of duplication/loss at the node + optimal cost of left subtree + optimal cost of right subtree, if they inherit j copies
Step 2: Enumerate all histories from the cost tables • Maintain 3 variables for each node • dups = optimal number of duplicated genes • losses = optimal number of lost genes • out = optimal number of genes to pass to its children
out = 1, losses = dups = 0 out = 1 , losses = dups = 0 dups = 1 losses = 0 dups = 0 losses = 0 dups = 1 losses = 0
Step 3: Build a gene tree to represent the history • From step 2: 1 duplication in humans & 1 duplication in frogs • Build the gene tree with this information & the topology of the species tree