1 / 25

Bioinformatics

Bioinformatics. – a definition ?. The design , construction and use of software tools to generate , store , annotate , access and analyse data and information relating to Molecular Biology. OR. Biologists doing “stuff” with computers?.

lida
Download Presentation

Bioinformatics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bioinformatics – a definition ? The design, construction and use of software tools to generate, store, annotate, access and analyse data and information relating to Molecular Biology OR Biologists doing “stuff” with computers? Here we consider the use of Bioinformatics tools rather than their design and construction Here we consider the access and analysis of data and information items rather than their generation, storage or annotation

  2. Software Tools for Sequence Analysis Packages that offer a comprehensive range of bioinformatics tools for sequence analysis. General Packages: Most researchers would expect to use such packages at some time. Packages that offer tools for a particular type of analysis. Specialised Packages Used intensely by researchers in the relevant area, not at all by everyone else. Tools whose nature inclines them to be primarily accessed over the network. WWW Resources These categorisations are very general Many specialist programs are incorporated into the general packages. Most things can be done at a web site somewhere.

  3. Sequence Analysis – an Overview Nucleic Acid Sequences Protein Sequences Sequencing Project Management Database Retrieval Restriction Mapping Primer Design DNA/RNA Folding Nucleic Acid Sequence Analysis Database Retrieval Seeking Coding regions Database Similarity Searching Translation to amino acids Pairwise Sequence Comparison Multiple Sequence Alignment Protein Sequence analysis Prediction of Function Structure prediction Motifs and Patterns Phylogeny Structure analysis

  4. Software Tools for Sequence Analysis GCG Wisconsin Package General Packages: Commercial UNIX only WWW and X GUIs Comprehensive Widely available Open source UNIX only Several GUIs (java, WWW, X) Comprehensive Similar structure to the GCG package Open source Windows, MacOS X, UNIX Excellent GUI including interactive graphical output Notcomprehensive butallows access to EMBOSS

  5. Software Tools for Sequence Analysis General Packages: Commercial Expensive Other options Windows PCs or Macintoshes Good GUIs Public Domain Windows, Macintosh, UNIX Modern intuitive GUI Access remote databases

  6. Sequence Analysis – an Overview Nucleic Acid Sequences Protein Sequences Sequencing Project Management Database Retrieval Restriction Mapping Primer Design DNA/RNA Folding Nucleic Acid Sequence Analysis Database Retrieval Seeking Coding regions Database Similarity Searching Translation to amino acids Pairwise Sequence Comparison Multiple Sequence Alignment Protein Sequence analysis Prediction of Function Structure prediction Motifs and Patterns Phylogeny Structure analysis

  7. Software Tools for Sequence Analysis Specialised Packages Sequencing Project Management Free academic licence Excellent base call confidence estimation (phred) “The Phred - Phrap Package” By Phil Green et al Excellent large scale contig assembler (phrap) Available by anonymous ftp Excellent GUI Excellent contig editor Excellent finishing tools Simple confidence estimation Contig assembler – not good for big projects BUT phred and phrap can be accessed from Staden GUI

  8. Software Tools for Sequence Analysis Specialised Packages DNA/RNA Folding Free for academic use Can be installed locally or run via a WWW page Incorporated into the GCG general package Michael Zuker`s Programs Protein Structure Analysis Nominal fee for academic use LINUX, IRIX, Windows Whatif by Gert Vriend

  9. Software Tools for Sequence Analysis SYBYL Insight II Specialised Packages Protein Structure Analysis – for very rich people IRIX, HP-UX, LINUX IRIX, AIX, LINUX Both systems are very impressive @ very expensive

  10. Software Tools for Sequence Analysis PHYLIP Specialised Packages Phylogeny Available by anonymous ftp Windows, Macintosh, UNIX Incorporated into the EMBOSS general package Commercial, but reasonable UNIX, VMS, DOS and windows Incorporated into the GCG general package

  11. Sequence Analysis – an Overview Nucleic Acid Sequences Protein Sequences Sequencing Project Management Database Retrieval Restriction Mapping Primer Design DNA/RNA Folding Nucleic Acid Sequence Analysis Database Retrieval Seeking Coding regions Database Similarity Searching Translation to amino acids Pairwise Sequence Comparison Multiple Sequence Alignment Protein Sequence analysis Prediction of Function Structure prediction Motifs and Patterns Phylogeny Structure analysis

  12. Software Tools for Sequence Analysis Bioscience AG WWW Resources Database Retrieval Sequence Retrieval System Retrieves MUCH more than sequences Core elements free to academic sites Implemented in many places It is possible to integrate analysis tools Elements of SRS are incorporated into EMBOSS

  13. Software Tools for Sequence Analysis WWW Resources Database Retrieval Retrieves MUCH more than sequences Access to NCBI databases only Entrez client software available by anonymous ftp Most general packages include tools to access local sequence databases EMBOSS programs can access sequences from remote SRS servers

  14. Software Tools for Sequence Analysis FASTA WWW Resources Database Similarity Searching Very popular, very widely available Not sensitive – But extremely fast Popular, widely available Not sensitive – much slower than blast Can be installed locally or run via a WWW page Available by anonymous ftp (blast, fasta) BOTHblast & fasta DNA/Protein query V DNA/Protein database Incorporated into the GCG general package

  15. Software Tools for Sequence Analysis WWW Resources Database Similarity Searching Fully sensitive Slow algorithm – fast computers MPsrch Protein V Protein only Major use when blast/fasta fail Exclusively a WWW resource

  16. Software Tools for Sequence Analysis Burkhard Rost WWW Resources Structure prediction Was consensus service now JNet only JNet available by anonymous ftp Older service, similar approach to JNet Main element is called PHD Both JPred and PHD work best from aligned protein families Simpler methods predicting from single sequences in most general packages

  17. Software Tools for Sequence Analysis genscan at the MIT (Free academic license) primer3 at the MIT (Available by anonymous ftp) WWW Resources Other WWW services General Services: EBI Pasteur Institute And many more Protein sequence analysis Expasy Gene finding Simple gene finding in most general packages Primer design Primer design in most general packages Primer design in EMBOSS is primer3

  18. Databases Database are available from WWW sites and highly interlinked Clinical and Mutation OMIM MGMD Bibliographic PubMed Raw Sequence As accessed for “sequence retrieval”

  19. Databases Sequence Databases Contain both raw sequence data and annotation DNA Sequences (European Molecular Biology Laboratory) GenBank (NCBI) Refseq (NCBI) DNA Data Bank of Japan Protein Sequences Refseq (NCBI) PIR Trembl (GenPept)

  20. Databases Database are available from WWW sites and highly interlinked Clinical and Mutation OMIM MGMD Bibliographic PubMed Raw Sequence As accessed for “sequence retrieval” Alignments and Patterns As generated by analysis software

  21. Databases Alignments and Patterns Alignments Aligned protein families Comprised of a number of sections Aligned protein domains Automatically generated from protein sequence databases Conserved “blocks” of protein alignments Used to compute scoring schemes for protein comparisons

  22. Databases Alignments and Patterns Patterns Patterns are largely derived from the conserved portions of aligned protein families Representations of single motifs Now comprised of both simple patterns and HMM profiles Representations of patterns of motifs (fingerPRINTS)

  23. Databases Database are available from WWW sites and highly interlinked Clinical and Mutation OMIM MGMD Bibliographic PubMed Raw Sequence As accessed for “sequence retrieval” Alignments and Patterns As generated by analysis software Structural PDB Integrated Ensembl

  24. The End.

  25. dpj10@mole.bio.cam.ac.uk

More Related