130 likes | 276 Views
CS 6293 AT: Current Bioinformatics HW2 Papers 1. BLAT- -The BLAST-Like Alignment Tool 2. Classification of DNA sequences using Bloom Filters. Course Intructor Dr. Jianhua Ruan Presenters Husnu Narman Nihat Altiparmak. BLAT--The BLAST-Like Alignment Tool. W. James Kent (2002) UCSC
E N D
CS 6293 AT: Current BioinformaticsHW2Papers1. BLAT--The BLAST-Like Alignment Tool2. Classification of DNA sequences using Bloom Filters Course Intructor Dr. Jianhua Ruan Presenters Husnu Narman Nihat Altiparmak
BLAT--The BLAST-Like Alignment Tool W. James Kent (2002) UCSC Cited by 2229(Google Scholar)
Brief Information About BLAST • BLAST: Basic Local Allignment Search Tool • Find a gene in different kinds of databases Divide query to small part words and compare High Scoring Segments Pairs(HSP) Scan for exact matches in HSP Extend exact matches to HSP List all of the HSPs in the database Evaluate, handle exceptions, and reports
BLAT • BLAT: The Blast-Like Alignment Tool • Find a gene in different kinds of databases • Why new search tool?
Differences between BLAST and BLAT BLAST • Index of Query • Triggers extension one or two hit occur • List of exons sorted by size BLAT • Index of Database • Triggers extensions any number perfect or near perfect hits • Look up location of a sequence in genome or determine exon structure of a mRNA
Classification of DNA sequences using Bloom Filters Strannheim et al. (2010) Stockholm, SWEDEN
Classification of DNA sequences using Bloom Filters • New generation sequencing technologies • Complex datasets • New efficient, specialized sequence analysis algorithms • Often, only noval sequences required, unnecessary sequences(belonging to a known genome) need to be removed • A new algorithm(FACS) to classify sequences as belonging or not belonging to a reference sequence • Source code available at; • http://facs.biotech.kth.se
Bloom Filter • A memory efficient data structure for testing whether an element is part of a reference set • m bit vector with k hash functions • Never returns a false negative; may however return a false positive • Optimal number of hash functions;
Example Bloom Filter x y z √ x √ w
Method • Bloom filter is created from the reference sequence with desired K-mer and false positive rate. • The query sequences are then classified by using the bloom filter
Evaluation • Experimental metagenome dataset(Allander et al. 2005) containing 177184 reads • Analysis using human genome as a reference • FACS, BLAT and SSAHA2 compared 31x 21x