190 likes | 288 Views
Sequence Alignment 序列組合. 組員 : B97570133 沈冠宇 B97570142 林哲宇 B97570145 翁以諾 B97570154 歐柏宏. Outline. 生物信息 ,序列比對,來找出不同序列之間的相似之處,並鑒定這些相似之處可能在功能、 結構 、 進化 關係等上的相關性的方法。 DNA 序列比對 是本系 白敦文 ( 葉問 ) 教授 的專長之一此學年白老師的專題奕是此 話說白老師實驗室買了所謂的 次世代定序儀 全台僅有 20 台左右 人類身上 僅有 A 、 C 、 T 、 G 四種
E N D
Sequence Alignment序列組合 組員: B97570133 沈冠宇 B97570142 林哲宇 B97570145 翁以諾 B97570154 歐柏宏
Outline • 生物信息,序列比對,來找出不同序列之間的相似之處,並鑒定這些相似之處可能在功能、結構、進化關係等上的相關性的方法。 • DNA序列比對 是本系 白敦文(葉問)教授 的專長之一此學年白老師的專題奕是此 話說白老師實驗室買了所謂的 次世代定序儀 全台僅有20台左右 人類身上 僅有 A、C、T、G 四種 但白老師表示存成純文字檔將有3GigaBytes即使有次世代定序儀 亦要消耗大量時間比對 (生物方面 資工系不予深究) (白老師表示: 我現在生物還是很爛…)
Scoring Role Match Mismatch
Scoring Role(Cont’d) • ai=bj Score=2 • ai or bj align with a blank Score=-1 • ai≠bj Score=-1
Find an alignment which has the Highest score • 基本上有點類似Dynamic Programming 中的 LCS 1) A(i,j)=the score of optimal alignment 2) A(0,0)=0 3) A(i,0)= -i 4) A(0,j)= -j 5) If (ai=bj)then A(i,j)= A(i-1,j-1) +2 Else A(i,j)=Max{(A(i-1,j) –1, A(i,j-1) –1, A(i-1,j-1) –1 ) }
Find an alignment which has the Highest score(Cont’d) Score = -1+2-1+2-1-1-1 = -1
Find an alignment which has the Highest score(Cont’d) Score= -1+2-1-1+2-1-1= -1
Find an alignment which has the Highest score(Cont’d) Score= -1+2-1-1+2-1-1= -1
Algorithm Description DP algorithms have a strong relationship to recursion: define a base case and prove that you can extend. If you already have the optimal solution to: X…Y A…B then you know the next pair of characters will either be: X…YZ or X…Y- or X…YZ A…BC A…BC A…B- (where “-” indicates a gap). So you can extend the match by determining which of these has the highest score. Sequence Alignment -- Gary Jackoway
Global versus Local Alignment Want to find local matching areas, even when farremoved from each other in the sequence: ACTTAGCAGACTAACGTAAC CCATGACTAACGGGACCTAC Smith-Waterman: Use Needleman-Wunsch but add: IF value<0, replace with 0 (and set backtrack to none). When matrix is complete, backtrack from all localmaxima, creating local matching alignments. Sequence Alignment -- Gary Jackoway
PAM: Percent Accepted Mutation Substitution Matrix (Dayhoff) • Substitution matrices based on sound evolutionary principles. • Find PAM1 by comparing groups of proteins known to be evolutionarily closely related. • Find PAM-n my multiplying PAM1 by itself n times. • PAM60: ~60% similar, PAM250: ~20% similar. • The more distant the expected relationship, the higher PAM-n should be used. Sequence Alignment -- Gary Jackoway
BLOSUM: BLOcks SUbstition Matrix • Start with highly-conserved patterns (blocks) in a large set of closely related proteins. • Use the likelihood of substitutions found in those sequences to create a substitution probability matrix. • BLOSUM-n means that the sequences used were n% identical. • BLOSUM62 is “standard”. Sequence Alignment -- Gary Jackoway