320 likes | 473 Views
Graphical Models for Protein Kinetics. Nina Singhal CS374 Presentation Nov. 1, 2005. Outline. Background material on proteins Why study protein kinetics Graphical models for kinetics Motion planning view (Apaydin et al, 2003) Molecular dynamics view (Singhal et al, 2004) Conclusions.
E N D
Graphical Models for Protein Kinetics Nina Singhal CS374 Presentation Nov. 1, 2005
Outline • Background material on proteins • Why study protein kinetics • Graphical models for kinetics • Motion planning view (Apaydin et al, 2003) • Molecular dynamics view (Singhal et al, 2004) • Conclusions
Alpha Helix Beta Strand and Sheet Beta Barrel Background on Proteins
Structure Prediction MTYKLILNGKTLKGETTTEAVDAATAEKVFKQYANDNGVDGEWTYDDATKTFTVTE Given an amino acid sequence, what 3D structure will the protein form? ?
Pathways and Kinetics How does a protein actually get from an unfolded configuration to a folded configuration? ?
Folding Kinetics • Rate of folding • Uniqueness of pathway • Order of secondary structure formation • Secondary or tertiary structure
Applications • Misfolded proteins and diseases • Alzheimer's • Cystic fibrosis • Mad cow disease • Intermediates may be important as drug targets • Protein design
Representation of a Protein N1 N1 C Psi omega phi psi Ca N2 N2 R A protein with n amino acids can be represented using 2n phi-psi angles, each in the range [0, 2p)
Graphical Models for Protein Kinetics • Protein conformations have different energies • Graphical models discretize the conformation space and connect nearby regions with edges
Robotics Motion Planning 2p c1 c1 c2 c2 0 2p Robot with 2 degrees of freedom 2D configuration space Moving the robot arm from c1 to c2 is just finding a path in the configuration space from c1 to c2.
Roadmap Method • Randomly sample points in configuration space. Keep feasible ones. • Connect these points to form a graph. • Process path queries using standard graph search techniques. 2p c1 c2 0 2p
Protein Folding as a Search Problem • Protein folding can be represented as a search through the protein’s configuration space • Replace collision free constraint with a preference for low energy configurations • Instead of finding any path, want to find all the energetically favorable paths
Stochastic Roadmap Simulation (Apaydin et. al. 2003) • Sample protein configuration at random • Add edges between nearby nodes • Take advantage of the many folding pathways contained within a roadmap • Efficiently calculate many properties of the entire landscape
Roadmap Construction • Nodes in the graph are sampled uniformly at random • Edges are added between nearest neighbors with probability: if DEij > 0 otherwise
Roadmap as a Markov Chain • We can view the molecular motion as a random walk over the roadmap • Roadmap can be regarded as discretely sampled version of Monte Carlo simulation • If fact, in the limit, probability distributions of Monte Carlo simulation and the roadmap converge
Transmission Coefficients • Measures “kinetic distance” • Probability that a conformation will fold before unfolding • Can calculate by starting many Monte Carlo simulations from the conformation • Very computationally expensive ? ? Unfolded state Folded state
Algebraic Method for Calculating Transmission Coefficients F vi U Pij vj
Transmission Coefficients (cont) • System of linear equations • One equation and one unknown for each node • Can be solved iteratively • Low connectivity of the graph results in a sparse matrix
Studied a synthetic landscape and a real protein, ROP Protein was represented with 6 degrees of freedom, two vectors connected by a loop Results • Correlation of transmission coefficients calculated by roadmaps and Monte Carlo simulations
Benefits and Drawbacks • Extremely efficient at calculating kinetic properties like transmission coefficients • Unclear whether low-dimension representation of protein is adequate • Monte Carlo simulations may not be accurate enough for protein kinetics
Molecular dynamics Simulate protein movement using Newton’s laws of motion Bond vibration Isomer- ation Water dynamics Helix forms Fastest folders typical folders slow folders 10-15 femto 10-12 pico 10-9 nano 10-6 micro 10-3 milli 100 seconds MD step long MD run where we need to be where we’d love to be
Folding@Home:Worldwide desktop grid computing ~150,000 CPUs over the world (CPU locations from IP address)
Markovian Model Method(Singhal et al. JCP 2004) • Generate molecular dynamics trajectories from transition path sampling or independently • Cluster nearby points into macrostates to build roadmap with also include transition time • Calculate the mean first passage time and Pfold using linear algebra
Step 1: sampling of paths • Pick a random point from current path • Shoot a path from this point • If path reaches initial or final state by some cutoff time, stop simulation and accept it • Define new current path
Step 2: Generation of roadmap • Nodes are accepted points, edges connect successive nodes • Cluster nearby points to make roadmap more connected • Calculate edge weights by counting number of transitions between nodes and normalize
Step 3 (opt): Re-weighting of edges • Can analyze roadmap at parameter values other than the simulated ones without need for additional simulations • For temperature, can re-weight edges by the relative probabilities at the two temperatures according to the dynamics • Renormalize edges so outgoing probability sums to one
Calculating Pfolds and MFPT • Equation for each node is conditioned on which neighbor it transitions to • One equation and one unknown for each node • Can be solved iteratively
Energy landscape and initial pathway • 2-D energy landscape • Initial and final regions defined by circles around the two minima • Initial paths generated by Monte Carlo or Langevin dynamics I F
Results - Pfold • Compare Pfold values to those from many direct simulations • Correlation coefficients are 0.99 for both
Results - MFPT • Compare MFPT at different temperatures to those from 10,000 direct simulations
Results – Trp zipper b-hairpin • Analyzed existing simulation data of a small, 12 residue, protein • 1750 trajectories, each 10 - 450 ns, resolution of 10 ns for non-folding and 250 ps for folding • Combine into roadmap • Depending on clustering cutoffs, MFPT = 2-9 ms • Agrees with experimental results of 2.47 ±0.05 ms and previous analysis of simulation data of 4.5 ms
Conclusions • Graphical methods produce a network of possible protein pathways • These networks can be efficiently analyzed to compute kinetic properties • Very fast method for looking at simple protein models or analyzing existing molecular dynamics data