1 / 34

Practice retrieving data and running stand alone BLAST. 

This guide provides step-by-step instructions on how to identify genes in the ABA biosynthesis pathway from the Arabidopsis Cyc database and retrieve their sequences for running a standalone BLAST analysis.

rjarrell
Download Presentation

Practice retrieving data and running stand alone BLAST. 

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Practice retrieving data and running stand alone BLAST.   Step 1. Identify genes in the ABA biosynthesis pathway from the Arabidopsis Cyc database http://www.arabidopsis.org/biocyc/index.jsp Step 2. Identify subject database Vitis vinifera (nucleotide) Solanum pennellii (EST)

  2. Query: Select Pathway by name  Enter: Abscisic Acid Submit

  3. Now what?

  4. Filter for unique sequences (EXCEL: Data, Filter, Advanced Filter…)

  5. Notepad ++ EDIT, LINE OPPERATIONS, JOIN LINES SEARCH, REPLACE, “space” with “spaceORsapce” Paste into ENTREZ Nucleotide search…

  6. PERL chomp; next if /^\s/; #(skip if there is a space in start of the line) next if /^Gene/; #(if line starts with “gene”, skip) my @temp = split /\t/; #(data set is tab delimited) $hash{$temp[0]} = 1; #(unique sequence i.d. #0 is first element of the array) Then invoke BioPerl to query NCBI with the search string: TAIR:AT### AND “complete cds” Where AT### are the unique accession numbers from AraCyc and “complete cds” eliminates genomic sequence (e.g. complete Ath chrom 4) See complete script on class site….

  7. Do we want this much sequence?

  8. Use the push pin to highlight all boxes for mRNA (22 sequences) so we don’t get chromosome 4 genomic sequences

  9. Try: Use Unix to verify that the file contains all the sequences… Q: What command would you use? A: $ grep –c “>” filename

  10. (lycopersicum [ORGN] AND EST) AND "Solanum pennellii"[porgn:__txid28526]

  11. Try: Use Unix to verify that the file contains all the sequences…

  12. Nucleotide Vitis [ORGN] AND EST

  13. Note syntax of ENTREZ search invoked by organism tree link

  14. For class, I recommend downloading the smaller Nucleotide data set…

  15. Try: Use Unix to verify that the file contains all the sequences…

  16. Now what? Which file needs to be formatted for BLAST (formatdb)? Which file will be the query file? What is the syntax for the BLAST (including PATH)?

  17. Formatdb $ /path/formatdb -i /path/filename –p F Run nucleotide BLAST (blastn) $ /path/blastall -p blastn -d /path/filename -i /path/filename –o filename –e 0.01

More Related