1 / 31

Biostatistics-Lecture 16 Sequence alignment based on Burrows-Wheeler Transformation

Biostatistics-Lecture 16 Sequence alignment based on Burrows-Wheeler Transformation. Ruibin Xi Peking University School of Mathematical Sciences. Burrows-Wheeler Transformation. BWT: ACGGTACA$ ($<A<C<G<T). Burrows-Wheeler Transformation. BWT: ACGGTACA$ ($<A<C<G<T).

luz
Download Presentation

Biostatistics-Lecture 16 Sequence alignment based on Burrows-Wheeler Transformation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Biostatistics-Lecture 16Sequence alignment based on Burrows-Wheeler Transformation Ruibin Xi Peking University School of Mathematical Sciences

  2. Burrows-Wheeler Transformation • BWT: ACGGTACA$ ($<A<C<G<T)

  3. Burrows-Wheeler Transformation • BWT: ACGGTACA$ ($<A<C<G<T)

  4. Burrows-Wheeler Transformation • BWT: S=ACGGTACA$ ($<A<C<G<T) BWT(T) T

  5. Burrows-Wheeler Transformation • Last-First Mapping (LF) BWT(T) T

  6. Burrows-Wheeler Transformation • Last-First Mapping (LF) BWT(T) T

  7. Burrows-Wheeler Transformation • Last-First Mapping (LF)

  8. Burrows-Wheeler Transformation • Last-First Mapping (LF)

  9. Burrows-Wheeler Transformation • Last-First Mapping (LF) • We may recover the original sequence using the LF mapping

  10. Burrows-Wheeler Transformation • Last-First Mapping (LF) • We may recover the original sequence using the LF mapping

  11. BWT via the suffix array • Suffix Array (SA)

  12. BWT via the suffix array • Relationship of BWT and suffix array (0-based index)

  13. BWT via the suffix array • Construction of the BWT by matrix rotation is slow • There are O(n) algorithms for constructing suffix array • We may construct the BWT via the suffix array

  14. FM-index • C(c): # of occurrences of the characters {$,1,…,c-1} • 1=A, 2=C, 3=G,4=T • C(c) is the position of the first occurrence of c in F (the 1st column in BWM) • Occ(c,1,k): # of occurrences of c in BWT(T)[1:k]

  15. FM-index

  16. FM-index • LF(k) = C(L[k]) + Occ(L[k],0,k)-1

  17. Searching a pattern P using the FM-index • Note that any pattern P always occur contiguously in the BWM (e.g. AC)

  18. Searching a pattern P using the FM-index • Note that any pattern P always occur contiguously in the BWM (e.g. ba)

  19. Searching a pattern P using the FM-index • Note that any pattern P always occur contiguously in the BWM (e.g. ab)

  20. Searching a pattern P using the FM-index • P = ACA • Suffix start with A

  21. Searching a pattern P using the FM-index • P = ACA • Suffix start with A is at [sp,ep] = [C(A),C(C)-1]

  22. Searching a pattern P using the FM-index • P = ACA • Suffix start with CA

  23. Searching a pattern P using the FM-index • From the last step, the first A prefixed by C is at Occ(C,0,sp-1) in the A section, the last is Occ(C,0,ep)-1 in the A section

  24. Searching a pattern P using the FM-index • Suffix start with CA must in the C section

  25. Searching a pattern P using the FM-index • Suffix start with CA is in [sp,ep]=[Occ(C,0,sp-1)+C(C),Occ(C,0,ep)+C(C)-1]

  26. Searching a pattern P using the FM-index • Suffix start with ACA must in the A section

  27. Searching a pattern P using the FM-index • Suffix start with ACA is in [sp,ep]=[Occ(A,0,sp-1)+C(A), Occ(A,0,ep)+C(A)-1]

  28. Searching a pattern P using the FM-index • Algorithm BW_Search(P[0,p-1]) • c=P[p-1],i=p-1; • sp = C[c], ep= C[c+1]-1; • while(sp≤ep and i≥1) do • c = P[i-1] • sp = C[c] + Occ(c,0,sp-1) • ep = C[c] + Occ(c,0,ep)-1; • i = i-1; • if (ep < sp) then return “not found” else return found (ep-sp+1) occurrences

  29. Aligners Based on BWT • Bowtie • BWA

  30. Bowtie Performance

  31. BWA performance

More Related