1 / 35

Feb 20 , 201 9

Deep learning enhanced Markov State Models (MSMs). Wei Wang. Feb 20 , 201 9. Outline. General protocol of building MSM Challenges with MSM VAMPnets Time-lagged auto-encoder. Revisit the protocol of building MSM. Need a lot of expertise in biology & machine learning.

genevac
Download Presentation

Feb 20 , 201 9

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Deep learning enhanced Markov State Models (MSMs) Wei Wang Feb 20, 2019

  2. Outline • General protocol of building MSM • Challenges with MSM • VAMPnets • Time-lagged auto-encoder

  3. Revisit the protocol of building MSM

  4. Need a lot of expertise in biology & machine learning Wang, Cao, Zhu, Huang WIREs Comput. Mol. Sci., e1343, (2017)

  5. Criterion to choose a model: slowest dynamics Choose the MSM that best captures the slowest transitions of the system Wang, Cao, Zhu, Huang WIREs Comput. Mol. Sci., e1343, (2017)

  6. Choose the one with slowest transition Timescales (μs) Da, Pardo, Xu, Zhang, Gao, Wang, Huang,  Nature Communications., 7, 11244, (2016)

  7. Perform this cumbersome work: search • Propose good clustering algorithms & features • Parametric search using good strategies http://msmbuilder.org/osprey/1.1.0

  8. Challenges: parametric space is too large: Collective Variable (CV) Need to propose good features http://homepages.laas.fr/jcortes/algosb13/sutto-ALGO13-META.pdf

  9. Challenges: parametric space is too large: CV Need to propose good features http://homepages.laas.fr/jcortes/algosb13/sutto-ALGO13-META.pdf

  10. Challenges: parametric space is too large: CV Need to propose good features, otherwise will worsen the clustering stage Truth tICA Wehmeyera and Noe, J. Chem. Phys. 148, 241703 (2018)

  11. Challenges: parametric space is too large: clustering Zhang et al., Methods in Enzymology, 578, 343-371 (2016)

  12. Essence of these operations • Linearlly/Nonlinearllytransform the protein configurations into the state vectors: (1, 0, 0, 0) (0, 0, 1, 0) Husic and Pande, J. Am. Chem. Soc. 2018, 140, 2386−2396

  13. Deep learning can greatly help: powerful • In the mathematical theory of artificial neural networks, the universal approximation theorem states that a feed-forward network with a single hidden layer containing a finite number of neurons can approximate continuous functions on compact subsets of Rn, under mild assumptions on the activation function. • Deep learning has been widely applied in numerous fields Dog: 0.99 Cat: 0.01 https://en.wikipedia.org/wiki/Universal_approximation_theorem

  14. Deep learning can greatly help MSM Dog: 0.99 Cat: 0.01 Macro1: 0.990 Macro2: 0.005 Macro3: 0.005

  15. Outline • General protocol of building MSM • Challenges with MSM • VAMPnets • Time-lagged auto-encoder

  16. VAMPnets for deep learning of molecular kinetics • VAMPnets: employ the variational approach for Markov processes (VAMP) to develop a deep learning framework for molecular kinetics using neural networks, encodes the entire mapping from molecular coordinates to Markov states, thus combining the whole data processing pipeline in a single end-to-end framework. coordinates state vector Related to the implied timescale plot, maximize it Noe et al., 9, 5, 2018, Nature Communications

  17. Understanding VAMPnets • The basic structure of neural network • What is VAMP score

  18. Basic structure of neural network

  19. Forward propagation Where can we get the weights?

  20. Backpropagation to update the weights Define a objective function Weights are updated following the largest gradient direction http://www.saedsayad.com/images/ANN_4.png

  21. Backpropagation to update the weights https://independentseminarblog.files.wordpress.com/2017/12/giphy.gif

  22. Backpropagation to update the weights In VAMPnets, it is VAMP-2 score Define a objective function Weights are updated following the largest gradient direction http://www.saedsayad.com/images/ANN_4.png

  23. VAMP-2 score: objective function : state vector, e.g., if x belongs to state 2 Noe et al., 9, 5, 2018, Nature Communications

  24. VAMP-2 score: related to TPM Sum of eigenvalues of Related to the implied timescale plot, we want to maximize it : state vector, e.g., if x belongs to state 2 Noe et al., 9, 5, 2018, Nature Communications

  25. VAMPnets: example on alanine dipeptide Try to lump to 6 states Output: 6 probabilities 10 heavy atoms xyz for 10 heavy atoms Noe et al., 9, 5, 2018, Nature Communications

  26. VAMPnets: example on alanine dipeptide • Visualizing the outputs (soft assignments) • Once we have the state vectors, we can calculate TPM, and get the kinetics Noe et al., 9, 5, 2018, Nature Communications

  27. Comparison with traditional way to build MSM • Advantages • No need to worry about features to do tICA and the clustering algorithms • Inputs are simple: aligned trajectories • Find the variationally optimal one • Disadvantages • Easy to overfit the data • Easy to be trapped in local optimal Alanine dipeptide Noe et al., 9, 5, 2018, Nature Communications

  28. Outline • General protocol of building MSM • Challenges with MSM • VAMPnets • Time-lagged auto-encoder

  29. Other application of deep learning in MSM: CV • Improve PCA/tICA through nonlinear transformation trained by (time-lagged) auto-encoder • PCA/tICA: find the direction that maximizes the variance/time-lagged covariance matrix.

  30. PCA: minimizing reconstruction error http://alexhwilliams.info/itsneuronalblog/2016/03/27/pca/

  31. PCA: Linear version of auto-encoder Reconstructed data Original data Wehmeyer and Noe, J. Chem. Phys. 148, 241703 (2018)

  32. Improving tICA using time-lagged auto-encoder Time-lagged autoencoder: D,E are constant matrix in tICA Next frame Current frame Wehmeyer and Noe, J. Chem. Phys. 148, 241703 (2018)

  33. Improving tICA using time-lagged auto-encoder Time-lagged autoencoder: D,E are constant matrix in tICA Wehmeyer and Noe, J. Chem. Phys. 148, 241703 (2018)

  34. Time-lagged autoencoder improves over tICA Villin Wehmeyer and Noe, J. Chem. Phys. 148, 241703 (2018)

  35. Summary • Deep learning improves MSM in reducing the number of prior knowledge • However, deep learning may overfit the data when our sampling is not enough

More Related