Sequence Alignment 序列組合

Sequence Alignment序列組合 組員: B97570133 沈冠宇 B97570142 林哲宇 B97570145 翁以諾 B97570154 歐柏宏

Outline • 生物信息，序列比對，來找出不同序列之間的相似之處，並鑒定這些相似之處可能在功能、結構、進化關係等上的相關性的方法。 • DNA序列比對是本系白敦文(葉問)教授的專長之一此學年白老師的專題奕是此話說白老師實驗室買了所謂的次世代定序儀全台僅有20台左右人類身上僅有 A、C、T、G 四種但白老師表示存成純文字檔將有3GigaBytes即使有次世代定序儀亦要消耗大量時間比對 (生物方面資工系不予深究) (白老師表示: 我現在生物還是很爛…)

Scoring Role Match Mismatch

Scoring Role(Cont’d) • ai=bj Score=2 • ai or bj align with a blank Score=-1 • ai≠bj Score=-1

Scoring Role(Cont’d)

Find an alignment which has the Highest score • 基本上有點類似Dynamic Programming 中的 LCS 1) A(i,j)=the score of optimal alignment 2) A(0,0)=0 3) A(i,0)= -i 4) A(0,j)= -j 5) If (ai=bj)then A(i,j)= A(i-1,j-1) +2 Else A(i,j)=Max{(A(i-1,j) –1, A(i,j-1) –1, A(i-1,j-1) –1 ) }

Find an alignment which has the Highest score(Cont’d)

Find an alignment which has the Highest score(Cont’d) Score = -1+2-1+2-1-1-1 = -1

Find an alignment which has the Highest score(Cont’d) Score= -1+2-1-1+2-1-1= -1

Algorithm Description DP algorithms have a strong relationship to recursion: define a base case and prove that you can extend. If you already have the optimal solution to: X…Y A…B then you know the next pair of characters will either be: X…YZ or X…Y- or X…YZ A…BC A…BC A…B- (where “-” indicates a gap). So you can extend the match by determining which of these has the highest score. Sequence Alignment -- Gary Jackoway

Global versus Local Alignment Want to find local matching areas, even when farremoved from each other in the sequence: ACTTAGCAGACTAACGTAAC CCATGACTAACGGGACCTAC Smith-Waterman: Use Needleman-Wunsch but add: IF value<0, replace with 0 (and set backtrack to none). When matrix is complete, backtrack from all localmaxima, creating local matching alignments. Sequence Alignment -- Gary Jackoway

PAM: Percent Accepted Mutation Substitution Matrix (Dayhoff) • Substitution matrices based on sound evolutionary principles. • Find PAM1 by comparing groups of proteins known to be evolutionarily closely related. • Find PAM-n my multiplying PAM1 by itself n times. • PAM60: ~60% similar, PAM250: ~20% similar. • The more distant the expected relationship, the higher PAM-n should be used. Sequence Alignment -- Gary Jackoway

BLOSUM: BLOcks SUbstition Matrix • Start with highly-conserved patterns (blocks) in a large set of closely related proteins. • Use the likelihood of substitutions found in those sequences to create a substitution probability matrix. • BLOSUM-n means that the sequences used were n% identical. • BLOSUM62 is “standard”. Sequence Alignment -- Gary Jackoway

Sequence Alignment 序列組合

Sequence Alignment 序列組合

Presentation Transcript

Fundamentals in Sequence Analysis 1.(part 1)

EVERYDAY LEADERSHIP

Protein Structure Prediction

341: Introduction to Bioinformatics

Lecture 5: Searching Sequence Databases Eric C. Rouchka, D.Sc. eric.rouchka@uofl kbrin.a-bldg.louisville/~rouchka/CECS6

Horizontal Alignment Spiral Curves

Lecture 2: Pairwise Sequence Alignment Eric C. Rouchka, D.Sc. eric.rouchka@uofl.edu http://kbrin.kwing.louisville.edu/~

Pairwise and multiple sequence alignments

Ch 11 . Assessing Pairwise Sequence Similarity: BLAST and FASTA

Pairwise sequence Alignment

Vorlesung Grundlagen der Bioinformatik gobics.de/lectures/ss07/grundlagen

1-month Practical Course

BLAST : Basic local alignment search tool

CS 5263 Bioinformatics

Algorithms for Ultra-large Multiple Sequence Alignment and Phylogeny Estimation

Sequence Alignment and Dynamic Programming

Sequence Stratigraphy - Introduction November 2008

Introduction to Bioinformatics

Multiple Sequence Alignment

Ontology Alignment

Genome Sequence determination

Rapid Sequence Intubation