70 likes | 175 Views
NBLAST cluster variant of BLAST for NxN comparisons. Michel Dumontier Ph.D. Biochemistry (candidate) micheld@mshri.on.ca. Samuel Lunenfeld Research Institute, Mt. Sinai Hospital Department of Biochemistry, University of Toronto Toronto, Ontario, Canada. NBLAST. Motivation
E N D
NBLAST cluster variant of BLAST for NxN comparisons Michel Dumontier Ph.D. Biochemistry (candidate) micheld@mshri.on.ca Samuel Lunenfeld Research Institute, Mt. Sinai Hospital Department of Biochemistry, University of Toronto Toronto, Ontario, Canada
NBLAST Motivation 1M+ non-redundant protein sequences (NCBI) -> longer sequence comparison Objective Eliminate on-the-fly similar sequence searching for nr database sequences Provide a freely available, open-source, cluster-computer implementation that computes the all-against-all (NxN) BLAST sequence comparison Make the pre-computed databases freely available NBLAST NBLASTis written in C using the NCBI C Toolkit. Separate function and database layers NBLASTperforms the minimum number of sequence comparisons NBLASTstores the sequence alignments and the list of similar sequences (neighbours) as binary ASN.1
Task Partitioning Scheme • For each node: • Provide total # nodes and its node # • (Blast parameters) • NBLAST assigns ~equal number of comparisons • Assume C(x,y) = C(y,x) -> N*(N-1)/2 comparisons node[x] = N*N/(2*nodes)
Indexing & Storage Scheme • Generate 64-bit unique (UID) to store alignments from 2 32-bit ordinal numbers (ORD1, ORD2) mapped to GIs • Generate neighbour lists with e-values • SeqHound sequence and structure database management system • Local/remote C/C++/PERL API to access sequence alignments and neighbour lists • GI/Accession/Redundant Groups
OOKPIK CLUSTER Inuit : sealskin stuffed snowy owl toy 108 dual-PIII 450MHz, 512MB RAM Fiber Optic Head : 8-Processor HP N-Class Server 8GB RAM
NBLAST Summary • 1,000,000+ non-redundant sequences • 325 days for a single CPU computer • 35±5 hours for 216 CPU cluster computer • 80GB database for 70M pairwise alignments • 8GB Neighbour List • Daily updates
NBLAST Availability Acknowledgements GNU GPL SourceForgesource code & binaries: http://sourceforge.net/projects/slritoolsBMC Bioinformaticshttp://www.biomedcentral.com/1471-2105/3/13/ Dr. Christopher W.V. Hogue Gary Bader Doron Betel Howard Feldman Katerina Michalickova Michel Dumontier’s research activities are funded in part by NSERC