Applied Bioinformatics

Applied Bioinformatics Week 10

Bioinformatics & Functional Proteomics • How to classify proteins into functional classes? • How to compare one proteome with another? • How to include functional/activity/pathway information in databases? • How to extract functional motifs from sequence data? • How to predict phenotype from proteotype?

Bioinformatics & Expressional Proteomics • How to correlate changes in protein expression with disease? • How to distinguish important from unimportant changes in expression? • How to compare, archive, retrieve gel data? • How to rapidly, accurately identify proteins from MS and 2D gel data? • How to include expression info in databases?

Bioinformatics & Structural Proteomics • How to predict 3D structure from 1D sequence? • How to determine function from structure? • How to classify proteins on basis of structure? • How to recognize 3D motifs and patterns? • How to use bioinformatics databases to help in 3D structure determination? • How to predict which proteins will express well or produce stable, folded molecules?

Protein Folding Problem “Predict a three-dimensional structure of a protein from its amino acid sequence.” “How does a protein fold into the structure?” This question hasnot been solved for more than half a century.

If we can calculate the energy of the system precisely, it is possible to predict the structure of the protein! Proteins Can Fold into 3D Structures Spontaneously The three-dimensional structure of a protein is self-organized in solution. The structure corresponds to the state with the lowest free energy of the protein-solvent system. (Anfinsen’s dogma)

Levinthal Paradox We assume that there are three conformations for each amino acid (ex. α-helix, β-sheet and random coil). If a protein is made up of 100 amino acid residues, a total number of conformations is 3100 = 515377520732011331036461129765621272702107522001≒5 x 1047. If 100 psec (10-10 sec) were required to convert from a conformation to anotherone, a random search of all conformations would require 5 x 1047x 10-10 sec ≒1.6 x 1030 years. However, folding of proteins takesplace in msec to sec order. Therefore, proteins fold not via a random search but a more sophisticated search process. We want to watch the folding process of a protein using molecular simulation techniques.

Why is the “Protein Folding” so Important? • Proteins play important roles in living organisms. • Some proteins are deeply related with diseases. And structural information of a protein is necessary to explain and predict its gene function as well as to design molecules that bind to the protein in drug design. • Today, whole genome sequences (the complete set of genes) of various organisms have been deciphered and we realize that functions of many genes are unknown and some are related with diseases. • Therefore, understanding of protein folding helps us to investigate the functions of these genes and to design useful drugs against the diseases efficiently. • In addition to that, the understanding opens the door to designing of proteins having novel functions as new nano machines.

Forces Involved in the Protein Folding • Electrostatic interactions • van der Waals interactions • Hydrogen bonds • Hydrophobic interactions

The Energy Function • Calculate energies for each particle • Since long range interactions important for each pair of particles the pair-wise interactions should be calculated

System for Folding Simulations Without water molecules With water molecules # of atoms: 304 # of atoms:　304 + 7,377 = 7,681

Much Faster, Much Larger! • Special-purpose computer • Calculation of non-bonded interactions is performed using the special chip that is developed only for this purpose. • For example; • MDM (Molecular Dynamics Machine) or MD-Grape: RIKEN • MD Engine: Taisho Pharmaceutical Co., and Fuji Xerox Co. • Parallelization • A single job is divided into several smaller ones and they are calculated on multi CPUs simultaneously. • Today, almost all MD programs for biomolecular simulations (ex. AMBER, CHARMm, GROMOS, NAMD, MARBLE, etc) can run on parallel computers. • Fold@Home

Homology Modeling • Template Selection and Fold Assignment • Target – Template Alignment • Model Building • Loop Modeling • Sidechain Modeling • Model Evaluation

Fold Assignment and Template Selection • Identify all protein structures with sequences related to the target, then select templates • 3 main classes of comparison methods • Compare the target sequence with each database sequence independently, pair-wise sequence – sequence comparison, BLAST and FASTA • Multiple sequence comparisons to improve sensitivity, PSI-BLAST • Threading or 3-D template matching methods

Target – Template Alignment • Most important step in Homology Modeling • A specialized method should be used for alignment • Over 40% identity the alignment is likely to be correct. • Regions of low local sequence similarity become common when overall sequence identity is under 40%. (Saqi et al., Protein Eng. 1999) • The alignment becomes difficult below 30% sequence identity. (Rost, Protein Eng. 1999)

Model Building • Construct a 3-D model of the target sequence based on its alignment on template structures • Three different model building approaches • Modeling by rigid body assembly • Modeling by segment matching • Modeling by satisfaction of spatial restraints • Accuracies of these models are similar • Template selection and alignment have larger impact on the model

Screenshots from the Homology Modeling Server Swiss-Model • Construct a framework using known protein structures • Generate the location of the target amino acids on the framework • If loop regions not determined, additional database search or short simulations Swiss-MOD Web Server

Procedure of the MODELLER program • After obtaining restraints run a geometry optimization or real-space optimization to satisfy them

Errors in Homology Models • Errors in sidechain packing • Distortions and shifts in correctly aligned regions • Errors in regions without a template

d. Errors due to misalignment e. Incorrect templates

Model Building Programs

Applications

3D Structure Prediction? • Get a protein sequence • Go to: http://bioinf.cs.ucl.ac.uk/psipred • Use threading • Got to: http://www.rcsb.org/pdb • Find known structure • Folding@home • Ab inito prediction • FoldIt (http://fold.it/portal/) Crystal structure of a monomeric retroviral protease solved by protein folding game players. Increased Diels-Alderase activity through backbone remodeling guided by Foldit players.

Presentation • English • 15 min max  I will stop you • Including preparations before the talk • Including the talk • Including questions • Internet cannot be used so provide screen shots in your presentation • All presentations must be submitted by: 27.12.2010

Applied Bioinformatics