210 likes | 238 Views
DNA 多序列比对中的算法技术和并行方法. 邹权 (PH.D.&Professor) 天津大学 计算机科学与技术学院 2015.12. Multiple Sequence Alignment(MSA): What & Where. Different from Mapping, Assembly, BLAST. Multiple Sequence Alignment(MSA): What & Where.
E N D
DNA多序列比对中的算法技术和并行方法 邹权 (PH.D.&Professor) 天津大学 计算机科学与技术学院 2015.12
Multiple Sequence Alignment(MSA): What & Where • Different from Mapping, Assembly, BLAST
Multiple Sequence Alignment(MSA): What & Where • Different from Mapping, Assembly, BLAST • BLAST: Basic Local Alignment Search Tool Output Database Query
Multiple Sequence Alignment(MSA): What & Where Output input
Multiple Sequence Alignment(MSA): What & Where Multiple Sequence Alignment Phylogenetic tree Virus sequences Multiple DNA Sequence Alignment Population SNV calling Multiple SimilarDNA Sequence Alignment … Application Our Focus
Techniques for similar DNA MSA 1. k-band Dynamic Programming K-band -4 -5 0 -1 -1
Techniques for similar DNA MSA 2. Center star strategy S3 S1 S1 S3 S5 S2 S4 S2 S4 S5 tree alignment Center star strategy
Center Star for Multiple Sequence Alignment input sequences step 1 … search final result step 2 update sum up
Detecting the matching region with Trie S=AGACGTAGCCTAGCAGCCCGTACT S1=AGACGT S2=AGCCTA S3=GCAGCC S4=CGTACT T=AGACCTAGCTAGCAGCCCGTACACT
Center Star for Multiple Sequence Alignment input sequences trie trees step 1 … search final result step 2 update sum up
From Trie to Suffix Tree Trie Suffix Tree S1=AGACGTAGCCTAGCAGCCCGTACT S2= GACGTAGCCTAGCAGCCCGTACT S3= ACGTAGCCTAGCAGCCCGTACT S4= CGTAGCCTAGCAGCCCGTACT S5= GTAGCCTAGCAGCCCGTACT S6= TAGCCTAGCAGCCCGTACT S7= AGCCTAGCAGCCCGTACT S1=AGACGT S2=AGCCTA S3=GCAGCC S4=CGTACT …
Greedy search with suffix tree S=GTCCGAAGCTCCGG (1,1,4) (5,6,9) T=GTCCTGAAGCTCCGT 1234567890123456
Extreme MSA for Very Similar DNA Sequences input sequences step 1 … search final result step 2 update sum up
Experiments • 100 human mitochondria genome sequences • 16k length (1555KB) • Our output 1558KB • ClustalΩ 1627KB
Discuss: How to measure the similarity? • Global alignable • pairwise • multiple • Prove • optimization Extreme center star Global alignable
Software http://datamining.xmu.edu.cn/software/halign/ http://lab.malab.cn/soft/halign/ Quan Zou, Qinghua Hu, Maozu Guo, Guohua Wang. HAlign: Fast Multiple Similar DNA/RNA Sequence Alignment Based on the Centre Star Strategy. Bioinformatics. 2015,31(15): 2475-2481