1 / 13

Neighbor-Joining (NJ) Algorithm

Neighbor-Joining (NJ) Algorithm. NJ Algorithm. Similar to FM (also removes molecular clock assumption) but more sophisticated in how it selects clusters to join Produces unrooted trees Algorithm (similar to FM) Add a leaf to the tree for each taxon

zeal
Download Presentation

Neighbor-Joining (NJ) Algorithm

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Neighbor-Joining (NJ) Algorithm

  2. NJ Algorithm • Similar to FM (also removes molecular clock assumption) • but more sophisticated in how it selects clusters to join • Produces unrooted trees • Algorithm (similar to FM) • Add a leaf to the tree for each taxon • Initially make each taxon be its own cluster • Find the closest clusters (using more sophisticated criterion) • (place new node at distance given by a variant of 3-point formula) • Repeat previous step until all clusters are connected

  3. NJ “closeness” Criterion • Suppose that you are given n taxa x1, x2, x3, …, xn, and suppose that you have some tree that fits the distance data x5 x4 x3 x1 z y x6 x2 observation: d(x1,x2) + d(xi,xj) < d(x1,xi) + d(x2,xj) (right side includes yz twice, left does not)

  4. NJ “closeness” Criterion d(x1,x2) + d(xi,xj) < d(x1,xi) + d(x2,xj) • From previous slide d(x1,x2) + d(x3,x4) < d(x1,x3) + d(x2,x4) For a fixed i, say i = 3: d(x1,x2) + d(x3,x5) < d(x1,x3) + d(x2,x5) d(x1,x2) + d(x3,x6) < d(x1,x3) + d(x2,x6) … … … d(x1,x2) + d(x3,xn) < d(x1,x3) + d(x2,xn) ------------------------------------------------- Add d(x3,x1),d(x3,x2) , d(x3,x3), d(x2,x1), d(x2,x2) to both sides

  5. NJ “closeness” Criterion • From previous slide, if x1 and x2 are neighbors • Let • Then in general, if xk and xl are neighbors • NJ uses this observation to determine “closeness” and computes the smallest value M(k, l) to determine a cluster • Unlike UPGMA and FM, NJ has a more global view of “closeness” when selecting neighbors

  6. NJ new node Placement • If x1 and x2 are neighbors; where should new node y be by 3-point formula x4 x5 x3 … … … x1 -------------------------------------------------------------- y x2 add on right side d(x1,x1 ) + d(x1,x2) - d(x2,x1 ) - d(x2,x2 )

  7. NJ mini summary • For each pair of nodes xk and xl compute the quantity • Actually, could compute • When xk and xl are replaced by new node y, place y at • From now on Si will always be divided implicitly by (n-2)

  8. NJ Algorithm • From the distance matrix compute the criterion matrix • Find the smallest value in M(i, j) – cluster the corresponding pair • Connect taxa xi and xj with a new node y placed at distance • Remove xi and xj and replace with y; update the distance matrix using the 3-point formula • Repeat from beginning

  9. Apply the NJ algorithm to the given distance matrix: First compute Si=sum-of-row / (n-2) S1= 11.75 S2=10.25 S3=12.75 S4=14.25 S5=11.25 S6= 12.25 Compute M(1,2) = d(1,2) – S1 – S2 = 8 – 22= -14 M(1,3) = d(1,3) – S1 – S3 = 3 – 24.5= -21.5 M(1,4) = d(1,4) – S1 – S4 = 14 – 26 = -12 M(1,5) = d(1,5) – S1 – S5 = 10 – 23 = -13 M(1,4) = d(1,4) – S1 – S4 = 12 – 24 = -12 and so on … Find min value, i.e. the pair to cluster

  10. From previous slide we need to cluster x1 and x3 Add a new taxon x7 and place it at distance Recompute distances from x7 to all others using the 3-point formula x3 x1 2 1 x7 d(7,2) = ½(d(1,2) + d(3,2) – d(1,3)) = 7 d(7,4) = ½(d(1,4) + d(3,4) – d(1,3)) = 13 d(7,5) = ½(d(1,5) + d(3,5) – d(1,3)) = 9 d(7,6) = ½(d(1,6) + d(3,6) – d(1,3)) = 11

  11. Apply the NJ algorithm to the new distance matrix: First compute Si=sum-of-row / (n-2) S2= S4= S5= S6= S7= Compute M(2,4) = d(2,4) – S2 – S4 = M(2,5) = d(2,5) – S2 – S5 = M(2,6) = d(2,6) – S2 – S6 = M(2,7) = d(2,7) – S2 – S7 = and so on … Find min value, i.e. the pair to cluster

  12. From previous slide we need to cluster ? and ?? Add a new taxon x8 and place it at distance Recompute distances from x8 to all others using the 3-point formula x?? x? ? ? x8

  13. NJ Summary • Distance-based algorithm that produces unrooted trees • Removes the assumption of molecular clock, but does not give information about the root (common ancestor) • Typically performs better than UPGMA and FM – uses a more global criterion to select pairs to cluster • To detect the root could introduce an extra taxon (outgroup) that is more distantly related to the given taxa

More Related