220 likes | 421 Views
break. Consensus. Consensus tree. a. b. c. d. e. a. b. c. d. e. a. b. c. d. e. A consensus tree summarizes information common to two or more trees. Strict consensus. a. b. c. d. e. a. b. c. d. e. a. b. c. d. e. a. b. c. d. e. Strict consensus.
E N D
Consensus tree a b c d e a b c d e a b c d e A consensus tree summarizes information common to two or more trees.
Strict consensus a b c d e a b c d e a b c d e a b c d e Strict consensus Strict consensus includes only those groups that occur in all the trees being considered.
Strict consensus a b c d e a b c d e a b c d e a b c d e Strict consensus Problem: the split {ab} is found 2 out of 3 times, but is not shown in the strict consensus.
Majority-rule consensus a b c d e a b c d e a b c d e Majority-rule consensus a b c d e Majority-rule consensus: splits that are found in the majority of the trees are shown.
Majority-rule consensus a b c d e a b c d e a b c d e Majority-rule consensus a b c d e 67 100 67 The percentage of the trees supporting each split are indicated.
Problem with Majority-rule consensus a b c d e e b c d a Majority-rule consensus= Strict consensus = a b c d e However in both trees if we consider only {b,c,d}, then in both trees b is closer to c than b to d, or c to d.
Adams consensus a b c d e e b c d a b c d a e Adams consensus Adams consensus will give the subtrees that are common to all trees. Adams consensus is useful where there are one or more sequences with unclear positions but there’s a subset of sequences that are common to all trees.
Problem with consensus of all the MP trees • Our goal is to evaluate the reliability of different clades. In other words, we do not want to rely just on one best tree, but rather estimate the support for each split based on many equally likely or highly likely trees.
Bootstrap (and jackknife)
Dugong African-ref African-1 African-2 African-3 Mam-3 Mam-4 Mam-6 Mam-5 Asian-2 Asian-3 Asian-1 Now we have a tree, but what is the robustness of this tree?
Jackknife A. We create new data sets by sampling half of the characters. (random samples without replacement). We generate 100 pseudo-data sets. Note: we do not change the number of sequences, just the number of positions!
Sp1 Sp2 Sp3 Sp4 Jackknife B. We reconstruct a tree from each data set. POS: 52316 1 : TATTT 2 : CATTT 3 : CACTT N : AACTT POS: 18745 1 : TTTAT 2 : TAACC 3 : TAACC N : TGGGA POS: 18394 1 : TTGTA 2 : TAGAC 3 : TAAAC N : TGAGG Sp1 Sp1 Sp2 Sp2 Sp3 Sp3 Sp4 Sp4
Sp1 Sp2 Sp3 Sp4 Jackknife C. We compute the majority rule consensus. Sp1 Sp1 Sp2 Sp2 Sp3 Sp3 Sp4 Sp4 In 67% of the data sets, the split between SP1+SP2 and the rest of the tree was found. 67% Sp1 100% Sp2 Sp3 Sp4
Bootstrap The same as jackknife, but instead of sampling N/2 positions, we sample N positions with replacement.
Bootstrap A. Resample (100-1000 time). 12345 N 1 : ATCTG…A 2 : ATCTG…C 3 : ACTTA…C N : ACCTA…T 12345 N 1 : AATTT…T 2 : AATTT…G 3 : AACTT…T N : AACTT…T 11244 x 12345 N 1 : TTTAT…T 2 : TAACC…G 3 : TAACC…T N : TGGGA…T 47789…x 12345 N 1 : AGGTA…T 2 : AGGAC…G 3 : AAAAC…A N : AAAGG…C 15578… N
Sp1 Sp2 Sp3 Sp4 Bootstrap B. Reconstruct a tree from each data set. 12345 N 1 : AATTT…T 2 : AATTT…G 3 : AACTT…T N : AACTT…T 11244 x 12345 N 1 : TTTAT…T 2 : TAACC…G 3 : TAACC…T N : TGGGA…T 47789…x 12345 N 1 : AGGTA…T 2 : AGGAC…G 3 : AAAAC…A N : AAAGG…C 15578… N Sp1 Sp1 Sp2 Sp2 Sp3 Sp3 Sp4 Sp4
Sp1 Sp2 Sp3 Sp4 Bootstrap C. Compute the majority rule consensus. Sp1 Sp1 Sp2 Sp2 Sp3 Sp3 Sp4 Sp4 67% Sp1 Remark: in a bootstrap tree branch lengths have no meaning. 100% Sp2 Sp3 Sp4