560 likes | 801 Views
BLAST Sequence Searching in Registry. ®. Soichi Tokizane November 2002. You will learn…. How sequences are represented in the Registry file today How to use BLAST for similarity searching Techniques for finding references to BLAST results. Sequence Information from CAS.
E N D
BLAST Sequence Searching in Registry ® Soichi Tokizane November 2002
You will learn… • How sequences are represented in the Registry file today • How to use BLAST for similarity searching • Techniques for finding references to BLAST results
Sequence Information from CAS
CAS creates the Registry database CAS Registry growth since 1965 Substances Registered (millions) 01
CA has covered biochemistry journals and patents since 1907 Oxidizing enzymes - (III) Specific nature of tyrosinase and its action on products of disintegration of protein compds. Arch. Sci. phys. nat. gen. 24 1907
Today, the CA database contains a very complete bioscience collection • Journals and patents from • More than 3,000 bioscience titles • Patents from 33 countries plus EP and WO • Over 500 books and over 300 book series • Conference proceedings • Dissertations It covers:
Over 40% of the 21.5 million bibliographic records in CA cover bioscience information
Biomolecules (sequences) are a major substance class in REGISTRY 36 million substances
Virtually all types of sequences are covered in Registry • Sequences from earlier literature • Novel nucleic acid primers and probes • Protein sequences deduced from gene translation and ESTs • Sequences with uncommon or non-natural residues • Chemically modified sequences • Fusion proteins • Genetically engineered sequences • Protein nucleic acids (PNAs)
BLAST Sequence Similarity Searching
Registry offers several sequence search techniques • BLAST similarity (“homology”) searching • similarity searching is the retrieval of sequence matches based on identity, conservation, and gaps • Sequence code match: exact, family, motif, pattern • Sequence name search
BLAST is a similarity matching algorithm • BLAST stands for Basic Local Alignment Search Tool • Produced and offered by the U.S. National Center of Biotechnology Information (NCBI) • Designed to quickly compare nucleic and amino acid sequences against desired databases
Search Application Find patent references for sequences similar to the following recombinant human collagen. Conduct a comprehensive search in Registry on STN. MRAWIFFLLCLAGRALAAPLADYKDDDDKP GYLGGFLLVLHSQTDQEPTCPLGMPRLWTG YSLLYLEGQEKAHNQDLGLAGSCLPVFSTL HQVCHYAQRNDRSYWLASAAPLPRAWIFF MMPLSEEAIRPYVSRCAVCEAPAQAVAVHS QDQSIPPCPQTWRSLWIGYSFLMHTGAGDQ GGGQALMSPRAAPFLECQGRQGTLADY CHFFANKYSFWLTTVKADLQFSSAPAPDTL KESQAISRCQVCVKYS
CAS Registry BLAST via STN on the Web is easy to use 1. Install sequence plug-in 2. Conduct Registry BLAST similarity search 3. Search selected BLAST answers in STN to get the literature references
BLAST is available via STN on the Web • A plug-in must be downloaded and installed before using the BLAST module • It is a one-time only requirement • The plug-in is free • Clicking on “Get Sequence Plug-in” takes you to easy-to-use Instructions
Conduct Registry BLAST Similarity Search
Follow these steps for Registry BLAST searching • Launch CAS Registry BLAST • Submit sequence query • Examine results and return to STN • Continue searching in STN on the Web
Logon to STN on the Web and select the Sequence Assistant 1. 2.
Select from one of three STN online options before launch Click on Launch button
Submit sequence query • In a new session, the only available option is Similar Sequences • Fast BLAST is available after the first search • Click on the Similar Sequences button to open the Search by Sequence query page Search by Sequence
Type in a result name • Type desired name for sequence search • Alpha or numeric • Spaces and punctuation allowed • STN will assign sequential number if you do not name the search • The name can also be changed later in the Main Menu
Recall Sequence is useful for re-submitting the same query with different settings • The most recently searched sequence is stored in a buffer that can be retrieved using this function • This function is grayed out when you first begin
Read from File allows you to upload directly from a file • The file can be: • A text file (e.g. .txt) • In GCG or FASTA format • An STN record (SQIDE display)
The sequence query must be 1-letter code • The sequence query can be • Copied and pasted • Read from File • Typed directly • a Recalled sequence • The sequence length limit is 50,000 characters
Searches can be run on a subset of the Registry File • For proteins, the three options are: • The default is all CA sequences Other options are available for nucleic acids, such as include or exclude GenBank records.
BLAST default settings are optimized • Parameters can be modified • Search sensitivity • Low complexity filtering • Maximum number of answers • Show advanced options
Advanced functions should only be modified with a thorough understanding of BLAST principles • Users are encouraged to contact bioinformatics departments for details, advice, and recommendations • Additional information is also available at the NCBI Web page http://www.ncbi.nlm.nih.gov/
The Main Window is for managing results • The Main Window has columns for • Assigned name • Type of search • Time created • Status • Results • Reviewed status
Results can be viewed once the search is complete • The results are permanently stored on STN, until deleted by the user • Old results can be reviewed when desired • Up to 50 results sets can be stored Highlight Then view
Select desired alignments for transfer to STN • Check boxes • Select by score category • Select all
Transfer RNs to STN • Select Transfer RNs to STN • Message indicates when the transfer is complete • Log off the BLAST system -- Select Exit from File menu or close browser
Retrieve RNs from BLAST • The Sequence Assistant page appears after you exit BLAST • Select the Retrieve RNs from BLAST option
Return to STN on the Web • STN will indicate if session is logged off • If so, log on to STN on the Web • Select Sequence Assistant • Retrieve RNs from BLAST To obtain a transcript of your session, you must log in again. Back to the STN on the Web login page
Continue STN Searching
The Sequence Assistant transferred several “packets” of numbers, which are all OR’ed together in L6. L-Numbers are created from the automatic transfer
L-Numbers are used for reference searches These search results can be optionally combined with DGENE, with routine use of STN’s multifile search interaction.
STN Express with Discover! 6.01is now available for Sequence Searching http://www.cas.org/ONLINE/STN/interact/express.html
x Transferring BLAST data into an STN session is seamlessly integrated into the software