1 / 31

It & Health 2009 Summary

It & Health 2009 Summary. Thomas Nordahl Petersen. Teachers. Bent Petersen. Thomas Nordahl Petersen. Ramneek Gupta. Rasmus Wernersson. Lisbeth Nielsen Fink. Thomas Blicher. Anders Gorm Pedersen. Outline of the course. Topics will cover a general introduction to bioinformatics Evolution

Download Presentation

It & Health 2009 Summary

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. It & Health 2009Summary Thomas Nordahl Petersen

  2. Teachers Bent Petersen Thomas Nordahl Petersen Ramneek Gupta Rasmus Wernersson Lisbeth Nielsen Fink Thomas Blicher Anders Gorm Pedersen

  3. Outline of the course • Topics will cover a general introduction to bioinformatics • Evolution • DNA / Protein • Alignment and scoring matrices • How does it work & what are the numbers • Visualization of multiple alignments • Phylogenetic trees and logo plots • Commonly used databases • Uniprot/Genbank & Genome browsers • Protein 3D-structure • Artificial neural networks & case stories • Practical use of bioinformatics tools • Preparation for exam

  4. Topics covered - (some of them)

  5. Information flow in biological systems

  6. Amino Acids Amine and carboxyl groups. Sidechain ‘R’ is attached to C-alpha carbon The amino acids found in Living organisms are L-amino acids

  7. Amino Acids - peptide bond N-terminal C-terminal

  8. 1 and 3-letter codes There are 20 naturally occurring amino acids Normally the one/three codes are used Met - M Asn - N Pro - P Gln - Q Arg - R Ser - S Thr - T Val - V Trp - W Tyr - Y Ala - A Cys - C Asp - D Glu - E Phe - F Gly - G His - H Ile - I Lys - K Leu - L

  9. Theory of evolution CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Charles Darwin 1809-1882

  10. Phylogenetic tree

  11. Global versus local alignments Global alignment: align full length of both sequences. (The “Needleman-Wunsch” algorithm). Local alignment: find best partial alignment of two sequences (the “Smith-Waterman” algorithm). Global alignment Seq 1 Local alignment Seq 2

  12. Pairwise alignment: the solution ”Dynamic programming” (the Needleman-Wunsch algorithm)

  13. Sequence alignment - Blast

  14. Sequence alignment - Blast

  15. Blosum & PAM matrices • Blosum matrices are the most commonly used substitution matrices. • Blosum50, Blosum62, blosum80 • PAM - Percent Accepted Mutations • PAM-0 is the identity matrix. • PAM-1 diagonal small deviations from 1, off-diag has small deviations from 0 • PAM-250 is PAM-1 multiplied by itself 250 times.

  16. Sequence profiles (1J2J.B) >1J2J.B mol:aa PROTEIN TRANSPORT NVIFEDEEKSKMLARLLKSSHPEDLRAANKLIKEMVQEDQKRMEK

  17. Log-odds scores • BLOSUM is a log-likelihood matrix: • Likelihood of observing j given you have i is • P(j|i) = Pij/Pi • The prior likelihood of observing j is • Qj , which is simply the frequency • The log-likelihood score is • Sij = 2log2(P(j|i)/log(Qj) = 2log2(Pij/(QiQj)) • Where, Log2(x)=logn(x)/logn(2) • S has been normalized to half bits, therefore the factor 2

  18. BLAST Exercise

  19. Genome browsers - UCSC Intron - Exon structure Single Nucleotide polymorphism - SNP

  20. SNPs

  21. Protein 3D-structure

  22. Protein structure Primary structure: Amino acids sequences Secondary structure: Helix/Beta sheet Tertiary structure: Fold, 3D cordinates

  23. Protein structure-helix helix 3 residues/turn - few, but not uncommon -helix 3.6 residues/turn - by far the most common helix Pi-helix 4.1 residues/turn - very rare

  24. Protein structurestrand/sheet

  25. Protein folds Class 4’th is ‘few secondary structure Architecture Overall shape of a domain Topology Share secondary structure connectivity

  26. Protein 3D-structure

  27. Neural NetworksFrom knowledge to information Biological feature Protein sequence

  28. Use of artificial neural networks • A data-driven method to predict a feature, given a set of training data • In biology input features could be amino acid sequence or nucleotides • Secondary structure prediction • Signal peptide prediction • Surface accessibility • Propeptide prediction C N Signal peptide Propeptide Mature/active protein

  29. Prediction of biological featuresSurface accessible Predict surface accessible from amino acid sequence only.

  30. Logo plots Information content, how is it calculated - what does it mean.

  31. Logo plots - Information Content Sequence-logo Calculate Information Content I = apalog2pa + log2(4), Maximal value is 2 bits Completely conserved ~0.5 each • Total height at a position is the ‘Information Content’ measured in bits. • Height of letter is the proportional to the frequency of that letter. • A Logo plot is a visualization of a mutiple alignment.

More Related