90 likes | 232 Views
Vertebrate natural history in the 21 st century: genetics, ecology, and evolution. Andrew DeWoody Purdue University. BLASTed computers! !@#$%&*. Some research from the DeWoody lab. Scale. we are looking for genes which underlie traits of evolutionary interest in non-models
E N D
Vertebrate natural history in the 21st century: genetics, ecology, and evolution Andrew DeWoody Purdue University
BLASTed computers!!@#$%&* Some research from the DeWoody lab
Scale • we are looking for genes which underlie traits of evolutionary interest in non-models • e.g., osmoregulatory genes in kangaroo rats • genes involved in salamander metamorphosis • MHC genes • RNA-seq/transcriptomics • so far, mostly 454 data… • small in terms of genome projects, but large enough to be computationally problematic (for us) • this is NOT a how-to talk!
Nick: BLAST annotation of a de novo transcriptome assembly • kangaroo rat transcriptome sequences were assembled from 454 runs (i.e., RNA-seq) • yielded 20,484 contigs for kidney tissue and 23,376 contigs for spleen tissue • conducted BLASTx search to compare the sequences against the nr database on NCBI in the program Blast2GO® (Götz et al. 2008) • to find known proteins that match our cDNA reads
Time required • Blast2GO® sends out queries in batches of 5 sequences • search settings included a cutoff of <1e-6 for the minimum e-value of a match and returning only the top 5 hits • i.e., we get 25 hits from each batch query • used Genomics server • we setup the search 1 week ago (2 Dec @ 7pm) for kidney contigs AND for spleen contigs • they were each 85% finished as of 11am today • a separate search is necessary for each additional database (e.g., Swiss-Prot)
Nick’s wish list • increase the number of queries possible at any given time (i.e., >>5) while retaining flexibility • allow the user to specify more options • e.g., limiting the BLAST database to specific taxonomic groups • allow the user to specify multiple databases in the same query (e.g. return the top BLAST hit from a gene for both the Swiss-Protand NCBI’s nr databases during the same BLAST search) • i.e., search in parallel
Kendra: Kangaroo rat singletons • Goal: To isolate MHC genes in kangaroo rats • we considered ~80,000 sequences that were not assembled into contigs • BLASTn (cutoff e-15) • analysis took 10 days on PC using Java Applet over the internet • we know, not the best approach! • yielded 300 hits
Kendra’s wish list • would like to harvest hits for best e-value and max length • rank correlation? • would also like to know the top hit categories/descriptions and number of times they occur • i.e., would like a more precise tool than the blunt offerings of BLAST2GO… • probably beyond Carol’s scope
Bamboo’s BLAST Search Summary Kangaroo rat kidney; query sequences in the 1st and 3rd row are randomly chosen transposable elements from Repbase.