310 likes | 433 Views
It & Health 2009 Summary. Thomas Nordahl Petersen. Teachers. Bent Petersen. Thomas Nordahl Petersen. Ramneek Gupta. Rasmus Wernersson. Lisbeth Nielsen Fink. Thomas Blicher. Anders Gorm Pedersen. Outline of the course. Topics will cover a general introduction to bioinformatics Evolution
E N D
It & Health 2009Summary Thomas Nordahl Petersen
Teachers Bent Petersen Thomas Nordahl Petersen Ramneek Gupta Rasmus Wernersson Lisbeth Nielsen Fink Thomas Blicher Anders Gorm Pedersen
Outline of the course • Topics will cover a general introduction to bioinformatics • Evolution • DNA / Protein • Alignment and scoring matrices • How does it work & what are the numbers • Visualization of multiple alignments • Phylogenetic trees and logo plots • Commonly used databases • Uniprot/Genbank & Genome browsers • Protein 3D-structure • Artificial neural networks & case stories • Practical use of bioinformatics tools • Preparation for exam
Amino Acids Amine and carboxyl groups. Sidechain ‘R’ is attached to C-alpha carbon The amino acids found in Living organisms are L-amino acids
Amino Acids - peptide bond N-terminal C-terminal
1 and 3-letter codes There are 20 naturally occurring amino acids Normally the one/three codes are used Met - M Asn - N Pro - P Gln - Q Arg - R Ser - S Thr - T Val - V Trp - W Tyr - Y Ala - A Cys - C Asp - D Glu - E Phe - F Gly - G His - H Ile - I Lys - K Leu - L
Theory of evolution CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Charles Darwin 1809-1882
Global versus local alignments Global alignment: align full length of both sequences. (The “Needleman-Wunsch” algorithm). Local alignment: find best partial alignment of two sequences (the “Smith-Waterman” algorithm). Global alignment Seq 1 Local alignment Seq 2
Pairwise alignment: the solution ”Dynamic programming” (the Needleman-Wunsch algorithm)
Blosum & PAM matrices • Blosum matrices are the most commonly used substitution matrices. • Blosum50, Blosum62, blosum80 • PAM - Percent Accepted Mutations • PAM-0 is the identity matrix. • PAM-1 diagonal small deviations from 1, off-diag has small deviations from 0 • PAM-250 is PAM-1 multiplied by itself 250 times.
Sequence profiles (1J2J.B) >1J2J.B mol:aa PROTEIN TRANSPORT NVIFEDEEKSKMLARLLKSSHPEDLRAANKLIKEMVQEDQKRMEK
Log-odds scores • BLOSUM is a log-likelihood matrix: • Likelihood of observing j given you have i is • P(j|i) = Pij/Pi • The prior likelihood of observing j is • Qj , which is simply the frequency • The log-likelihood score is • Sij = 2log2(P(j|i)/log(Qj) = 2log2(Pij/(QiQj)) • Where, Log2(x)=logn(x)/logn(2) • S has been normalized to half bits, therefore the factor 2
Genome browsers - UCSC Intron - Exon structure Single Nucleotide polymorphism - SNP
Protein structure Primary structure: Amino acids sequences Secondary structure: Helix/Beta sheet Tertiary structure: Fold, 3D cordinates
Protein structure-helix helix 3 residues/turn - few, but not uncommon -helix 3.6 residues/turn - by far the most common helix Pi-helix 4.1 residues/turn - very rare
Protein folds Class 4’th is ‘few secondary structure Architecture Overall shape of a domain Topology Share secondary structure connectivity
Neural NetworksFrom knowledge to information Biological feature Protein sequence
Use of artificial neural networks • A data-driven method to predict a feature, given a set of training data • In biology input features could be amino acid sequence or nucleotides • Secondary structure prediction • Signal peptide prediction • Surface accessibility • Propeptide prediction C N Signal peptide Propeptide Mature/active protein
Prediction of biological featuresSurface accessible Predict surface accessible from amino acid sequence only.
Logo plots Information content, how is it calculated - what does it mean.
Logo plots - Information Content Sequence-logo Calculate Information Content I = apalog2pa + log2(4), Maximal value is 2 bits Completely conserved ~0.5 each • Total height at a position is the ‘Information Content’ measured in bits. • Height of letter is the proportional to the frequency of that letter. • A Logo plot is a visualization of a mutiple alignment.