110 likes | 233 Views
SRA Transcript BLAST. Tom Madden May 15, 2009. BLAST. B asic L ocal A lignment S earch T ool Calculates similarity for biological sequences. Produces local alignments: only a portion of each sequence must be aligned.
E N D
SRA Transcript BLAST Tom Madden May 15, 2009
BLAST • Basic Local Alignment Search Tool • Calculates similarity for biological sequences. • Produces local alignments: only a portion of each sequence must be aligned. • Uses statistical theory to determine if a match might have occurred by chance.
Requirements for searching SRA sequences as a BLAST DB • Extract new or updated sequences. • Format into a BLAST database. • Provide disks for eight copies BLAST databases, each with 5 tera-bases (as of January). • Distribute databases to storage in Bethesda and Virginia. • Know how to quickly re-dump for policy changes or data corruption (e.g., unclipped or differently clipped reads should be searched).
Direct BLAST searches against the SRA archive. • Uses SRA toolkit and C++ BLAST API. • Smallest search unit is a “run”. • Multiple runs may be searched together. • Offers searches of 454 SRA transcripts (grouped by organism) at NCBI web page. • Clipped application reads are searched.
Advantages • The search set offered no longer depends upon how fast BLAST database can be produced and distributed. • Changes to SRA archive are seen immediately (e.g., change in clipping algorithm).
Three most popular organisms. • Human • Susscrofa • Tachyglossusaculeatus Counts searches after April 29, 2009 and only includes those with an average of two or more searches per session.
Future development • Allow users to build custom search sets. • Take mate-pair information into account. • Combine SRA searches with traditional BLAST database searches.
Acknowledgements • Kurt Rodarmer • Eugene Yaschenko • Ty Roach • Martin Shumway • Christopher O’Sullivan • Vahram Avagyan • Christiam Camacho • Yan Raytselis • Irena Zaretskaya