210 likes | 222 Views
This research paper discusses the use of Hadoop for reconstructing phylogenetic trees from ultra-large unaligned DNA sequences. It explores the challenges of handling a large number of sequences and the difficulties in multiple sequence alignment. The experiments conducted focus on human mtGenome and 16s rRNA data, comparing the running time and average SP score between aligned and unaligned data. The software used for this research is HAlign, which is a fast multiple sequence alignment tool. The paper concludes with a discussion on the limitations and several complex issues in evolution that are ignored in this study.
E N D
Reconstructing phylogenetic trees for ultra-large unaligned DNA sequences via with Hadoop Quan Zou(PH.D. & Prof.) Tianjin Univ, School of Computer zouquan@nclab.net http://cs.tju.edu.cn/faculty/zouquan/
Phylogenetic Tree • Genome-Genome • Gene-Gene • Population Model Computation
Background: challenge Too many sequences, Difficult to MSA
Flow---Clustering Sampling
More tricks in MSA input sequences trie trees step 1 search final result step 2 update sum up
Experiments • Data • Human mtGenome • 16s rRNA • Measurement • Running time • Average SP score (For MSA)
Experiments Running time comparison between aligned and unaligned data
Software http://datamining.xmu.edu.cn/software/halign/ Quan Zou, et al. HAlign: Fast Multiple Similar DNA/RNA Sequence Alignment based on Center Star Strategy. Bioinformatics. Doi:10.1093/bioinformatics/btv177. http://datamining.xmu.edu.cn/software/Phylogenetic_tree/
Discussion • Summary • MSA with Hadoop • NJ phylogenetic tree with Hadoop • From DNA to Protein • RNA secondary structure is ignored • Several complex issues in evolution are ignored
Thanks ! • Quan Zou(PH.D. & Prof.) • Tianjin Univ, School of Computer • zouquan@nclab.net • http://cs.tju.edu.cn/faculty/zouquan/