1 / 26

Tools and Datasets

Exploring the tools of the trade. Tools and Datasets. Sequence Databases. Understanding EMBL Entries Understanding SWISS-PROT Entries. Understanding EMBL Entries. Understanding SWISS-PROT Entries. General Concepts and Methods. Predictions and Validation.

ormand
Download Presentation

Tools and Datasets

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Exploring the tools of the trade Tools and Datasets

  2. Sequence Databases • Understanding EMBL Entries • Understanding SWISS-PROT Entries

  3. Understanding EMBL Entries

  4. Understanding SWISS-PROT Entries

  5. General Concepts and Methods • Predictions and Validation

  6. Recognise the difference between the validation of a model and the testing of it for self-consistency Maxim 17.1

  7. True/False/Negative/Positive

  8. Generally, False Negative predictions are considered more acceptable than False Positives Maxim 17.2

  9. figOUTCOME.eps Assessment/Validation Procedure and Possible Outcomes

  10. Balancing the errors

  11. With False Negatives we could come back next year and find the ones we missed, and these are preferred to False Positives, where we can waste time studying them this year, only to find out that the time was wasted. It all depends on the circumstances Maxim 17.3

  12. Sometimes all those false positives are maybe, just maybe, trying to tell you something. So, if you aspire to a Nobel prize ... Maxim 17.4

  13. Using multiple algorithms to improve performance

  14. Use a fast if inaccurate algorithm to protect your slow, accurate second-stage algorithm Maxim 17.5

  15. figTRNA.eps An overview of tRNA: 2D, 3D and Gene Structure

  16. Introducing Bioinformatics Tools http://www.ncbi.nlm.nih.gov/Education/

  17. ClustalW http://www-igbmc.u-strasbg.fr/BioInfo/ ftp://ftp.ebi.ac.uk/pub/software

  18. figCLUSTALX.eps ClustalX operating under Windows XP

  19. Algorithms and Methods $ gzip -d clustalw1.83.UNIX.tar.gz $ tar -xvf clustalw1.83.UNIX.tar $ cd clustalw1.83 $ make $ ./clustalw $ ./clustalw -h $ ./clustalw -INFILE=../MerAHMAs_MerP.swp -OUTFILE=../Mer.aln

  20. Substitution/scoring matrices

  21. BLAST

  22. Exactly which BLAST is best depends on the circumstances Maxim 17.6

  23. Installing NCBI-BLAST $ cd $ mkdir blast $ cp blast-2.2.6-ia32-linux.tar.gz blast $ cd blast $ gzip -d blast-2.2.6-ia32-linux.tar.gz $ tar -xvf blast-2.2.6-ia32-linux.tar [NCBI] Data="/home/michael/blast/data"

  24. Preparation of database files for faster searching $ mkdir databases $ cd databases $ mv ../All_Mer_Proteins.fsa . $ ../formatdb -i All_Mer_Proteins.fsa -p T -o T -n Merproteins $ blastall -p blastp -d databases/Merproteins -i test_seq.fsa $ sed 's/sw|/sp|/' All_Mer_Proteins.fsa > Mer_db.prot $ ../formatdb -i Mer_db.prot -p T -o T -n Merproteins

  25. The different types of BLAST search $ fastacmd -d databases/Merproteins -I $ fastacmd -d databases/Merproteins -s MERA_SHIFL $ blastclust -d databases/Merproteins | head

  26. Where To From Here

More Related