1 / 23

The Poor Beginners’ Guide to Bioinformatics

The Poor Beginners’ Guide to Bioinformatics. What we have – and don’t have . a computer connected to the Internet (incl. Web browser) a text editor (Notepad or better) public databases of genomic sequences public databases of cDNA + EST

maren
Download Presentation

The Poor Beginners’ Guide to Bioinformatics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Poor Beginners’ Guide to Bioinformatics

  2. What we have – and don’t have... • a computer connected to the Internet (incl. Web browser) • a text editor (Notepad or better) • public databases of genomic sequences • public databases of cDNA + EST • public databases of protein sequences, structures and motifs • money for specialised software packages • public servers capable of (almost) anything we wish to do

  3. Dealing with a sequence: model tasks • basic (DNA) sequence manipulation: restriction analysis, translation… • sequence similarity and pattern/motif searches • gene building: modelling exon-intron structures • protein domain searches,structure analysis • construction and interpretation of sequence alignments

  4. Notes on basic sequence handling • Make sure you have the correct format. • FASTA format is (almost) always correct. >sequencename thisisasequenceinfastaformat • If not, you can always use raw data. • If things don’t work, check for gaps in sequence, empty lines, and file extension. • BEWARE OF MICROSOFT!

  5. Model tasks continued … • basic (DNA) sequence manipulation: restriction analysis, translation… • sequence similarity and pattern/motif searches • gene building: modelling exon-intron structures • protein domain searches,structure analysis • construction and interpretation of sequence alignments

  6. FH3? FH1 FH2 Defining a gene family… • By overall domain structure • By domain sequence • Based on a peptide motif L-X-X-G-N-X-[ML]-N

  7. Sequence comparison-based searches • Entrez “related sequences” • easy identification of “false starts” • no organism selection • BLAST/FASTA • all DNA/protein combinations • taxonomy selection possible • statistical data provided • domain structure comparison available • divergent motifs may be missed Two methods are better than one.

  8. Notes on all sequence comparisons, searches, alignments… • Start with defaults (the authors know what they are doing)… • … BUT don’t be afraid to vary the parameters • Chose a reasonable scoring matrix: Distant sequences: low BLOSUM, high PAM Closely related sequences: low PAM, high BLOSUM

  9. Motif-based searches • sensitive • no statistics • only protein databases can be searched • TAIR PatMatch • Arabidopsis- specific • Problematic user interface • ISREC - INSECTS • admirable technology • access to SwissProt and TrEMBL • no organism selection

  10. Model tasks continued … • basic (DNA) sequence manipulation: restriction analysis, translation… • sequence similarity and pattern/motif searches • gene building: modelling exon-intron structures • protein domain searches,structure analysis • construction and interpretation of sequence alignments

  11. Some genes are more alike than others… • A number of splicing prediction servers available • Agreement of different methods is a good sign but no absolute measure • Always align ESTs if possible • Beware of non-conventional intron boundaries (GC-AG instead of GT-AG) • Plant data for transcription start/factor binding sites prediction are limited

  12. Model tasks continued … • basic (DNA) sequence manipulation: restriction analysis, translation… • sequence similarity and pattern/motif searches • gene building: modelling exon-intron structures • protein domain searches,structure analysis • construction and interpretation of sequence alignments

  13. Searching for PROSITE patterns – allowing ambiguities PROSITE and Pfam profile searches SMART, CDsearch (domains and more) Searching for known domains/motifs

  14. transmembrane segments prediction predicting signal peptides/anchors 2 methods available possibility to predict organelle localisation Predicting protein localisation

  15. Model tasks continued … • basic (DNA) sequence manipulation: restriction analysis, translation… • sequence similarity and pattern/motif searches • gene building: modelling exon-intron structures • protein domain searches,structure analysis • construction and interpretation of sequence alignments

  16. locally installed, free, for Mac and PC interactive domain definition statistical data provided may produce false-positive blocks (read the on-line manual!) “objective” results a number of servers available recommended for well-conserved proteins empiric parameters(e.g. gap penalties) bad for divergent sequences Alignment: “manual” or automated?

  17. Phylogenetic analyses • Two methods are better than one. • Your phylogeny cannot be better than your alignment. • Gaps are no data. • Allways do bootstrapping (100-500 cycles) • Certain questions cannot be answered from an unrooted tree.

  18. Points to take off... • go to the Bioinformatics page http://www2.rhul.ac.uk/~ujba110/Bioinfo.htm • select your exercise (A,B,C,D,E) • … and enjoy it! If you mean it seriously: • create your own bookmarks (seed provided on the course web page)

More Related