Computation in Biology

Computation in Biology Nagasuma Chandra Bioinformatics Centre & SERC IISc

Next-generation biologists must straddle computation and biology

Hierarchical structures in living systems Cell Tissue Organelle Organ Macromolecule Supramolesular assembly Organism

Genome Sequence- a book of life DOE-Genomes.org

examplesfromenglishtext genomicbiologytakesaholisticapproachtomolecularbiologyandev olutionbystudyingthecompletegenomeitsgenesanditsproteinexpre ssionpatternsncbiprovidesseveralgenomicbiologytoolsandresourc esincludingorganismspecificpagesthatincludelinkstomanywebsite sanddatabasesrelevanttothatspeciesweinviteyoutoexplorethelinks providedonthispage.

examplesfromenglishtext genomicbiologytakesaholisticapproachtomolecularbiologyandev olutionbystudyingthecompletegenomeitsgenesanditsproteinexpre ssionpatternsncbiprovidesseveralgenomicbiologytoolsandresourc esincludingorganismspecificpagesthatincludelinkstomanywebsite sanddatabasesrelevanttothatspeciesweinviteyoutoexplorethelinks providedonthispage. Genomic biology takes a holistic approach to molecular biology and evolution by studying the complete genome, its genes, and its protein expression patterns.NCBI provides several genomic biology tools and resources, including organism-specific pages that include links to many web sites and databases relevant to that species. We invite you to explore the links provided on this page.

Molecular circuitry in the cell

Biochemical networks www.expasy.ch

Cellular networks Characteristics of the yeast proteome: map of protein-protein interactions. H.Jeong, S.P. Mason, A.-L. Barabasi, Z.N. Oltvai, Nature, 411, 40-41 (2001);

Role of computation • Data management • Data Analysis & Interpretation • Prediction • Application

What you need… • A model • A computational tool

Models • Levels of modelling • Abstraction level • Hierarchy in living organisms

Abstraction level of the model

Molecular models • Sequences • Structures • Genome Sequences • The ‘omics’ era

Software tools • Accelrys • Tripos • MOE • BioSuite • Schrodinger • + hundreds of academic software bits

What you can do …………. • Sequence Space • Determine identity of the molecule • Predict physicochemical properties • Predict three dimensional structure • Predict Function • Apply in pharmaceutical/ other industries

Examples • Accelrys GCG • MOE • BioSuite

Example usage

Examples of GCG capabilities • Sequence Comparison • Database Searching and Retrieval • DNA/RNA Secondary Structure Prediction • Editing and Publication • Evolution • Fragment Assembly • Gene Finding and Pattern Recognition • Sequence Importing and Exporting • Mapping • Primer Selection • Protein Analysis

Single Gene/Protein Sequence analysis- MOE The colored bars over the sequences reflect the secondary structure of those sequences having associated atomic coordinates. Chains with sequence-only data have no such bars. In this instance, seven of the chains in the family have structural data and can therefore be used as structural templates. This image illustrates Residue Identity matrix in MOE which shows Chains 13 and 14 have the highest percent identity to the query sequence.

Whole genomeSequence analysis- BioSuite

Structures • Advantages of structural-level studies • The protein folding problem • Sequence-Structure Gap • Need to predict structure using computational methods • Applications

Four levels of protein structure

Structures • Advantages of structural-level studies • The protein folding problem • Sequence-Structure Gap • Need to predict structure using computational methods • Applications

What you can do …………. • Structure Space • Visualize structures • Build molecular models • Manipulate • Analyse • Simulate molecular behaviour • Apply in Drug Discovery

Visualization: Viewer Module of InsightII Pulldowns ModuleIcon Icon Palette Commandprompt Information Area

Visualizations

Ligand-Protein Interaction

Aiding NMR Structue determination

Aiding crystal structure determination.. X-ray crystallography

Building molecular models • Small molecules • Protein/ Nucleic acid/ Carbohydrates • Predicting Protein Structure • Homology modelling • Threading • Modifications- Site directed mutants • Protein-ligand complexes

BIOPOLYMER Biopolymer module provides tools for building and modifying a wide range of biological macromolecules, including proteins, peptides, nucleic acids, and carbohydrates. • It is useful in: • Building Proteins and Peptides • Structural Domain Analysis • Building Carbohydrates • Building Nucleic Acids • Structural Database Searching. This module in turn can be used later by other programs for structure refinement and analysis of small and large molecules Backbone structure of the C-terminal fragment of E.coli 50S ribosomal protein (in yellow), predicted from the carbon trace using the Protein/Backbone command of the Biopolymer module. The crystallographic backbone structure is shown superimposed in blue. The RMS deviation between corresponding backbone atoms of the two structures is 0.52 Angstroms

Manipulations HIS_229 • Eg., Conformation tweaking ASP_187 The following images are examples of this method of predicting conformations of a few long sidechains of PDB protein 1IC6.A. In each of the following figures, the native conformation is shown colored by element. In the left image, the predicted rotamer (the rotamer with the lowest deltaG) is shown in white. In the right image, all other rotamers generated by the conformational search are shown.

MODELER MODELER uses a comparative modeling methodology to rapidly build structural models for protein sequences without a known structure. It derives 3D protein models without the time consuming separate stages of core region identification and loop region building or searching that are inherent to manual homology modeling schemes. MODELER can create a model even with only one source protein. In this case, the structure for dihydrofolate reductase from Lactobacillus Casei is used to generate a model for the E. Coli protein. The model is 2.2 Å RMS deviation from the crystal structure of the E. Coli protein.

PROFILES – 3D Profiles-3D offers a unique approach to structure prediction by measuring the compatibility between protein sequences and known protein structures, and then using this information to address the inverse protein folding problem. Profiles-3D enables you to investigate which particular fold an amino acid sequence is likely to adopt. • Benefits: • Profiles-3D can test the validity of a model or preliminary structures derived from experimental data or modeling studies. • Profiles-3D can suggest which 3D structure an amino acid sequence is likely to adopt by relating structural properties to amino acid sequence information. • Reference template proteins identified by Profiles-3D can be used as input to InsightII Homology,MODELER module. This image shows the result of a “Profiles-3D Verify” showing a ribbon drawing of a model of myoglobin,where a single alpha-helix has been purposely misfolded.Profiles-3D has detected the misfolded region, and Insight II has automatically created the subset that was used to color the structure and ribbon.

MATCHMAKER MatchMaker uses an inverse-folding method to predict the 3D structure of a protein from its amino acid sequence.By comparing a new protein sequence to its topology fingerprint database, MatchMaker assesses the ability of a sequence to adopt characteristic topologies. Even in the absence of strong sequence similarity, MatchMaker generates high quality structural models. Examples of MatchMaker output, including a histogram of sequence-structural compatibility (upper right), a sub-optimal alignment plot (upper left),an energy profile (middle left), and a prediction of structural elements (helix/beta strand, buried/exposed) for the input sequence.

Simulations- ‘Discover’

Analysis • Protein characterization • Protein Comparison • Sequence-Structure-Function relationships • Active site detection • Ligand Binding mode analysis • Electrostatic analysis

Structure Analysis • Quality Check

PROTABLE ProTable used to analyze and evaluate protein structures. ProTable creates Ramachandran plots, assesses deviation of local geometries and side chain rotameric states from standard protein values, and determines the energetics of each residue. These images show the results of a ProTable evaluation of a theoretical model of prostatespecific antigen (2PSA). MatchMaker energies reveals a loop (highlighted in green) that may require further refinement. Structures (purple and blue are low probability; orange and red are high probability). An automated Ramachandran analysis (right) identifies backbone torsions in borderline or disallowed regions.

DELPHI DelPhi is a powerful and versatile Poisson-Boltzmann electrostatics simulation engine. DelPhi gives you the ability to determine the specificity of ligand-receptor interactions which aids in accelerating drug discovery. • DelPhi calculates: • Electrostatic properties,including the effects of bulk solvent and ionic strength for nucleic acids, polysaccharides, and complexes such as glycoproteins and protein/DNA. HIV protease, rendered with an electrostatic contour surface with a stick rendering of the drug inside the surface. Blue is positive, red is negative charge and gray is neutral.

Applications: Drug Discovery

SITEID SiteID provides analysis and visualization tools leading to the identification of potential binding sites within or at the surface of biological targets. • Applications: • Locate ligand binding pockets on a • Macromolecule. • Identify protein-protein • interaction surfaces. • Identify constraints in a novel protein • structure for 3D database searching to • find or optimize lead compounds. The binding pocket of dihydrofolate reductase located by SiteID and shown as a MOLCAD surface. The red areas of the surface indicate contact atoms in the pocket, while the yellow areas show the residues in which those atoms are contained. The inhibitor (methotrexate) is shown in green.

STRUCTURE BASED DESIGN TOOLS Active Site Detection:MOE uses a fast geometric algorithm, based on Edelsbrunner’s alpha shapes, to detect candidate protein-ligand and protein-protein binding sites. Individual sites can be visualized or populated with “dummy atoms” for docking calculations or Starting points for de novo ligand design efforts. Left PDB 1AAQ (HIV-1 Protease) and the first site located by the MOE Site Finder. Middle 1AAQ with the complexed ligand (hydroxyethylene isostere). Right Hydroethylene isostere overlaid with calculated alpha spheres of the first site.

FLEXX FlexX rapidly docks a conformationally flexible ligand into a binding site, using an incremental construction algorithm that builds the ligand in the active site. • FlexX is composed of four basic components: • Conformational flexibility. • Set of possible protein-ligand interactions. • Scoring function for the interactions. • Algorithm for placement and incremental growth of the ligand from a defined core. A set of inhibitors docked into the active site of Carboxypeptidase A by FlexX. The protein backbone and the active site surface were rendered using MOLCAD. The active site surface is color-coded by electrostatic potential.

RACHEL RACHEL performs automated combinatorial optimization of lead compounds by systematically derivatizing user-defined sites on the ligand. • Applications: • Combinatorially enumerate user defined sites on a lead scaffold to optimize binding within a receptor • Bridge high-affinity ligand fragments positioned within the active site The X-ray structure of N9 influenza virus neuraminidase (2QWK) shown with five ligands generated using RACHEL that are predicted to be active. Hydrogen bonds between the ligands and residues are indicated by dashed yellow lines. The surface was rendered using MOLCAD . Dark purple regions contain a greater Acceptor/donor density and light purple regions indicate areas where hydrogen bonding is less likely to occur.

HIGH THROUGHPUT DISCOVERY TOOLS HTS-QSAR :CCG’s unique Binary QSAR methodology is ideal for building pass/fail models from high error content data and standard molecular descriptors. The resulting probabilistic models (based on Bayesian statistical inference) are used as a biasing agent in the design of focused combinatorial libraries

CHEMINFORMATICS TOOLS Molecular Databases: The MOE Molecular Database is a disk-based spreadsheet central to the manipulation and visualization of large collections of compounds.Data can be imported and exported in various standard file formats and merged with structural or biological activity data. MOLECULAR DATABASE VIEWER MOLECULAR DATABASE CALCULATOR

Computation in Biology