1 / 9

Sequence allignement 1

Sequence allignement 1. Chitta Baral. Sequences and Sequence allignment. Two main kind of sequences Sequence of base pairs in DNA molecules (A+T+C+G)* Sequence of aminoacids in a protein molecule A(C+D+E+F+G+H+I+K+L+M+N+P+Q+R+S+T+V+W+X+Y )*Z Two main kind of sequence allignment

ayoka
Download Presentation

Sequence allignement 1

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Sequence allignement 1 Chitta Baral

  2. Sequences and Sequence allignment • Two main kind of sequences • Sequence of base pairs in DNA molecules • (A+T+C+G)* • Sequence of aminoacids in a protein molecule • A(C+D+E+F+G+H+I+K+L+M+N+P+Q+R+S+T+V+W+X+Y )*Z • Two main kind of sequence allignment • Global alignment • LGPSSKQTGKGS-SRIWDN • | | | | | | | • LN—I TKSAGKGAIMRLGDA • Local alignment • ----------TGKG------------------ • | | | • ----------AGKG------------------

  3. Importance of sequence alignment • Useful for discovering Functional, structural and evolutionary information. • Functional • DNA molecules that are very much alike or `similar’ in sequence analysis parlance probably have the same regulatory role. • Protein molecules that are very much alike probably have the same biochemical function • Structural • Protein molecules that are very much alike probably have the same 3-D structure • Evolutionary • If two sequences from different organisms are similar then there may have been a common ancestor sequence, and the sequences are then defined as being homologous. • The alignment indicates the changes that could have occurred between the two homologous sequences and a common ancestor sequence during evolution.

  4. Some terminology • Homologous: Genes that descended from a common ancestor are called homologs • Sequence homology is different from sequence similarity • The later is a measure of the matching characters in an alignment. • `sequences show 50% homology’ or `the sequences are highly homologous’ are meaningless. • Orthologous: when a lineage splits into two species • Paralogous: when a gene is duplicated in a genome

  5. Global alignement: Needleman-Wunsch algorithm • A dynamic programming algorithm • Input • Two strings: x and y of length n and m respectively. • Scoring table between the sequence alphabets and gap penalty • Output: The alignment with the best score • Algorithm terminologies • F(i,j) : The score of the best alignment between the initial segment x1…i and y1…j • Boundary values F(0,0) = 0; F(i,0) = -id; F(0,j) = -jd; where d is the gap penalty. • F(i,j) is the maximum of • F(i-1, j-1) + matching score between xi and yj • F(i-1, j) – d • F(I, j-1) -- d • Algorithm steps: • Fill the table following an appropriate order • While filling F(i,j) keep an arrow to the slot used in deriving F(i,j) • After F(n,m) is determined, trace back and construct the alignment. • Complexity of the algorithm: O(nm). If n =m then O(n2). • Note: With biological sequences and standard computers O(n2) algorithms are feasible but a little slow, while O(n3) algorithms are only feasible for very short sequences.

  6. Part of BLOSUM50 scoring matrix

  7. Illustration of Needleman-Wunsch

  8. Local Alignment: Smith-Waterman algorithm • Closely related to the global alignment algorithm. (few differences) • Top row and left column now filled with 0s. • F(i,j) = maximum of • 0 #means starting a new alignment • F(i-1,j-1) + s(xi,yj) • F(i-1,j) – d • F(i,j-1) -- d • Instead of taking the value in the bottom right corner, F(n,m) for the best score, we look for the highest value of F(i,j) over the whole matrix and start the traceback from there. • Traceback ends when we meet a cell with value 0, which corresponds to the start of the alignment.

  9. Illustration of Smith-Waterman

More Related