Graphical Models for Protein Kinetics

Graphical Models for Protein Kinetics Nina Singhal CS374 Presentation Nov. 1, 2005

Outline • Background material on proteins • Why study protein kinetics • Graphical models for kinetics • Motion planning view (Apaydin et al, 2003) • Molecular dynamics view (Singhal et al, 2004) • Conclusions

Alpha Helix Beta Strand and Sheet Beta Barrel Background on Proteins

Structure Prediction MTYKLILNGKTLKGETTTEAVDAATAEKVFKQYANDNGVDGEWTYDDATKTFTVTE Given an amino acid sequence, what 3D structure will the protein form? ?

Pathways and Kinetics How does a protein actually get from an unfolded configuration to a folded configuration? ?

Folding Kinetics • Rate of folding • Uniqueness of pathway • Order of secondary structure formation • Secondary or tertiary structure

Applications • Misfolded proteins and diseases • Alzheimer's • Cystic fibrosis • Mad cow disease • Intermediates may be important as drug targets • Protein design

Representation of a Protein N1 N1 C Psi omega phi psi Ca N2 N2 R A protein with n amino acids can be represented using 2n phi-psi angles, each in the range [0, 2p)

Graphical Models for Protein Kinetics • Protein conformations have different energies • Graphical models discretize the conformation space and connect nearby regions with edges

Robotics Motion Planning 2p c1 c1 c2 c2 0 2p Robot with 2 degrees of freedom 2D configuration space Moving the robot arm from c1 to c2 is just finding a path in the configuration space from c1 to c2.

Roadmap Method • Randomly sample points in configuration space. Keep feasible ones. • Connect these points to form a graph. • Process path queries using standard graph search techniques. 2p c1 c2 0 2p

Protein Folding as a Search Problem • Protein folding can be represented as a search through the protein’s configuration space • Replace collision free constraint with a preference for low energy configurations • Instead of finding any path, want to find all the energetically favorable paths

Stochastic Roadmap Simulation (Apaydin et. al. 2003) • Sample protein configuration at random • Add edges between nearby nodes • Take advantage of the many folding pathways contained within a roadmap • Efficiently calculate many properties of the entire landscape

Roadmap Construction • Nodes in the graph are sampled uniformly at random • Edges are added between nearest neighbors with probability: if DEij > 0 otherwise

Roadmap as a Markov Chain • We can view the molecular motion as a random walk over the roadmap • Roadmap can be regarded as discretely sampled version of Monte Carlo simulation • If fact, in the limit, probability distributions of Monte Carlo simulation and the roadmap converge

Transmission Coefficients • Measures “kinetic distance” • Probability that a conformation will fold before unfolding • Can calculate by starting many Monte Carlo simulations from the conformation • Very computationally expensive ? ? Unfolded state Folded state

Algebraic Method for Calculating Transmission Coefficients F vi U Pij vj

Transmission Coefficients (cont) • System of linear equations • One equation and one unknown for each node • Can be solved iteratively • Low connectivity of the graph results in a sparse matrix

Studied a synthetic landscape and a real protein, ROP Protein was represented with 6 degrees of freedom, two vectors connected by a loop Results • Correlation of transmission coefficients calculated by roadmaps and Monte Carlo simulations

Benefits and Drawbacks • Extremely efficient at calculating kinetic properties like transmission coefficients • Unclear whether low-dimension representation of protein is adequate • Monte Carlo simulations may not be accurate enough for protein kinetics

Molecular dynamics Simulate protein movement using Newton’s laws of motion Bond vibration Isomer- ation Water dynamics Helix forms Fastest folders typical folders slow folders 10-15 femto 10-12 pico 10-9 nano 10-6 micro 10-3 milli 100 seconds MD step long MD run where we need to be where we’d love to be

Folding@Home:Worldwide desktop grid computing ~150,000 CPUs over the world (CPU locations from IP address)

Markovian Model Method(Singhal et al. JCP 2004) • Generate molecular dynamics trajectories from transition path sampling or independently • Cluster nearby points into macrostates to build roadmap with also include transition time • Calculate the mean first passage time and Pfold using linear algebra

Step 1: sampling of paths • Pick a random point from current path • Shoot a path from this point • If path reaches initial or final state by some cutoff time, stop simulation and accept it • Define new current path

Step 2: Generation of roadmap • Nodes are accepted points, edges connect successive nodes • Cluster nearby points to make roadmap more connected • Calculate edge weights by counting number of transitions between nodes and normalize

Step 3 (opt): Re-weighting of edges • Can analyze roadmap at parameter values other than the simulated ones without need for additional simulations • For temperature, can re-weight edges by the relative probabilities at the two temperatures according to the dynamics • Renormalize edges so outgoing probability sums to one

Calculating Pfolds and MFPT • Equation for each node is conditioned on which neighbor it transitions to • One equation and one unknown for each node • Can be solved iteratively

Energy landscape and initial pathway • 2-D energy landscape • Initial and final regions defined by circles around the two minima • Initial paths generated by Monte Carlo or Langevin dynamics I F

Results - Pfold • Compare Pfold values to those from many direct simulations • Correlation coefficients are 0.99 for both

Results - MFPT • Compare MFPT at different temperatures to those from 10,000 direct simulations

Results – Trp zipper b-hairpin • Analyzed existing simulation data of a small, 12 residue, protein • 1750 trajectories, each 10 - 450 ns, resolution of 10 ns for non-folding and 250 ps for folding • Combine into roadmap • Depending on clustering cutoffs, MFPT = 2-9 ms • Agrees with experimental results of 2.47 ±0.05 ms and previous analysis of simulation data of 4.5 ms

Conclusions • Graphical methods produce a network of possible protein pathways • These networks can be efficiently analyzed to compute kinetic properties • Very fast method for looking at simple protein models or analyzing existing molecular dynamics data

Graphical Models for Protein Kinetics

Graphical Models for Protein Kinetics

Presentation Transcript

Graphical Models

Protein Folding Energetics, Kinetics and Models

Graphical Models for the Internet

Kinetics of Protein-Protein Interactions

Graphical Models

Graphical Models

Variational Methods for Graphical Models

Graphical Models

Graphical Models - Inference -

GRAPHICAL MODELS

Conditional Graphical Models for Protein Structure Prediction

Combinatorial Optimization for Graphical Models

Protein Quaternary Fold Recognition Using Conditional Graphical Models

Graphical Models

Conditional Graphical Models for Protein Structure Prediction

Graphical Causal Models

Expectation Propagation for Graphical Models

Probabilistic Graphical Models

Protein Quaternary Fold Recognition Using Conditional Graphical Models

Graphical Models