1.64k likes | 1.92k Views
bio-modeling. course layout. introduction molecular biology biotechnology bioMEMS bioinformatics bio-modeling cells and e-cells transcription and regulation cell communication neural networks dna computing fractals and patterns the birds and the bees ….. and ants. introduction.
E N D
course layout • introduction • molecular biology • biotechnology • bioMEMS • bioinformatics • bio-modeling • cells and e-cells • transcription and regulation • cell communication • neural networks • dna computing • fractals and patterns • the birds and the bees ….. and ants
far and away in the past • Newton’s equations of motions (17th -18th century) • Molecular dynamics (MD) • Boltzmann’s statistics (19th century) • Monte Carlo (MC) • Schrödinger/Heisenberg’s quantum mechanics (20th century)
birth of simulation in chemistry • 1950’s: do it by hand (or mechanical calculator)! • Tried to solve Newton’s equation of motion for small systems (e.g. three-atom system) • Didn’t take very long before they saw computers • 1970’s: Age of punchcards • 1980’s: Better IO devices Workstations dominated as research platforms
first generation (1980’s – 1990’s) • Gas phase reaction • (e.g.) H + H2 H2 + H • MD RB-C RA-B
first generation (1980’s – 1990’s) • Liquid simulation • (e.g.) Lennard-Jones Fluid • MD/MC
first generation (1980’s – 1990’s) • Proteins on lattice • MC
first generation (1980’s – 1990’s) • Quantum mechanical structure calculation (semi-empirical, ab initio, …)
revolution (~ 1995) • Workstation-like PCs • 100 hr Cray time 64MB / 150MHz Pentium • “Cheap and fast” • Impacts • Two directions • More accurate methods • Larger system • Start of bio-simulations
RB-C RA-B impact on “non-bio” simulations • Better surface • Revisions on existing surfaces • Dynamics on quantum mechanical surfaces • Quantum wavepacket dynamics • Time dependent Schrödinger equation instead of Newton’s equation • Totally quantum (can’t be more accurate) • Some people still do this for hydride/proton transfer in enzyme dynamics
Impacts on bio-simulations • Proteins got free from the lattice! • Off lattice model (still, each residue as a bead) • United atom approach (e.g. CH3 one atom) • All atom approach • With water (explicit solvent) • Without water (implicit solvent) • What to look at? • Kinetics: dynamic characteristics (e.g. folding simulation) • Thermodynamics: equilibrium characteristics (e.g. binding affinity of protein & drug)
solvent models • Implicit solvent • Solvent accessible surface area (SASA) Solvation free energy • Cheaper than explicit • Discrete nature of solvent not included • Different methods for SASA/free-E calculation • Generalized Born model (GB/SA) • Poisson-Boltzmann model (PB/SA) • Distance dependent dielectric (DD/SA)
solvent models Explicit solvent • Water as individual molecules • Expensive calculation • Periodic boundary conditions usually necessary • Rigid/flexible, polarizable/non-polarizable • SPC, TIP3P, TIP4P, TIP5P, …
impacts on bio-simulations • Proteins got free from the lattice! • Off lattice model (each residue as a bead) • United atom approach (e.g. CH3 one atom) • All atom approach • With water (explicit solvent) • Without water (implicit solvent) • What to look at? • Kinetics: dynamic characteristics (e.g. folding simulation) • Thermodynamics: equilibrium characteristics (e.g. binding affinity of protein & drug) • Remember, proteins are still big!
off lattice go model Developed from lattice model: “funnel concept” • Nature has developed proteins to fold (evolution) • Proteins can be modeled to fold • Native contacts energy surface • Matches with experimental observations
united atom/implicit model folding • “Statistical folding” • Starts from many independent trajectories • Lucky trajectories fold Nfolded / Ntotal = kfold x time
all atom unfolding • Folding inferred from unfolding • At high T, unfolding is fast (~ 1 ns) • Full atomistic detail from folded state to unfolded state
binding free energy: docking • Molecular modeling” • Binding free energy is calculated based on the shape of ligand and protein • Drug design
DF binding free energy: more accurate versions • Free energy: Potential + entropy factor • P + L PL • Thermodynamic integration (TI) • Free energy perturbation (FEP) • Jarzinsky’s inequality • Extremely expensive calculations
free energy landscape method • Kinetic information is inferred from free energy surface • Rough free energy surface can be obtained faster by parallelization • “Trajectory by intuition”
current limitation • Accuracies of models • Force field • Solvent models • Speed • For small proteins (<50 amino acids): 1 ns ~ 1 day • Biologically relevant event timescale > 1 ms • Size • Many proteins are not just large: they are huge!
responses to the challenges • Accuracy: Blend with quantum mechanical calculation • QM/MM, QM-trajectory method (e.g. CPMD) • Speed • E.g. Compute on video card • Size • E.g. Umbrella sampling
computational biology Biological Systems are complex, thus, a combination of experimental and computational approaches are needed.
computational biology • Computational Biology Bioinformatics • More than sequences, database searches, statistics or image analysis. • A part of Computational Science • Using mathematical modeling, simulation and visualization • Complementing theory and experiment
simplest chemical reaction AB • irreversible, one-molecule reaction • examples: all sorts of decay processes, e.g. radioactive, fluorescence, activated receptor returning to inactive state • any metabolic pathway can be described by a combination of processes of this type (including reversible reactions and, in some respects, multi-molecule reactions)
simplest chemical reaction AB various levels of description: • homogeneous system, large numbers of molecules = ordinary differential equations, kinetics • small numbers of molecules = probabilistic equations, stochastics • spatial heterogeneity = partial differential equations, diffusion • small number of heterogeneously distributed molecules = single-molecule tracking (e.g. cytoskeleton modelling)
kinetic description • Imagine a box containing N molecules. How many will decay during time t? k N • Imagine two boxes containing N/2 molecules each. How many decay? k N • Imagine two boxes containing N molecules each. How many decay? 2k N • In general: exact solution (in more complex cases replaced by a numerical approximation) differential equation (ordinary, linear, first-order)
biological building blocks DNA GAA GTT GAA AAT CAG GCG AAC CCA CGA CTG RNA GAA GUU GAA AAU CAG GCG AAC CCA CGA CUG PROTEIN GLUGALGLUASNGLNALAASNPROARGLEU
protein folding LEU ARG ASN PRO ALA ASN GLN GLU GLU VAL GLU VAL GLU ASN GLN ALA ASN PRO ARG LEU . . .
some fundamental questions • Question #1: Given a protein or DNA molecule, what is the geometric structure of the molecule? • Question #2: Why and how protein folds to a unique three-dimensional structure? • Question #3: Given a set of distances between pairs of atoms, how can we determine the coordinates of the atoms? • Question #4: Given the magnitudes of the structure factors of a protein, how can we determine the phases of the structure factors? • Question #5: Given two proteins, how can we compare their geometric structures? • Question #6: • …
methods for structure prediction and determination • Protein X-ray Crystallography • Nuclear Magnetic Resonance • Potential Energy Minimization • Molecular Dynamics Simulation • Homology Modeling • Fold Recognition • Inverse Protein Folding
empirical structure determination • Two major experimental methods for determining protein structure • X-ray Crystallography • Requires growing a crystal of the protein (impossible for some, never easy) • Diffraction pattern can be inverse-Fourier transformed to characterize electron densities (Phase problem) • Nuclear Magnetic Resonance (NMR) imaging • Provides distance constraints, but can be hard to find a corresponding structure • Works only for relatively small proteins
X-ray crystallography • X-rays, since wavelength is near the distance between bonded carbon atoms • Maps electron density, not atoms directly • Crystal to get a lot of spatially aligned atoms • Have to invert Fourier transform to get structure, but only have amplitudes, not phases
X-ray crystallography computing • In X-ray crystallography, protein first needs to be purified and crystallized, which may take months or years to complete, if not failed. • After that, the protein crystal is put into an X-ray equipment to make an X-ray diffraction image. The diffraction image can be used to determine the three-dimensional structure of the protein. • The process is time consuming, and some proteins cannot even be crystallized.
X-ray crystallography computing • A mathematical problem, called the phase problem, needs to be solved before every crystal structure can be fully determined from the diffraction data. • 80% of the structures in PDB Data Bank were determined by using X-ray crystallography.
NMR structure determination • The NMR approach is based on the fact that nuclei spin and generate magnetic fields. When two nuclei are close their spins interact. The intensity of the interaction depends on the distance between the nuclei. Therefore, the distances between certain pairs of atoms can be estimated by measuring the intensities of the nuclei spin-spin couplings. • The distance data obtained from the NMR experiment can be used to deduce the structural information for the molecule. One way of achieving such a goal is based on molecular distance geometry.
NMR structure determination • Not all distances between pairs of atoms can be detected. In practice, only lower and upper bounds for the distances can be obtained also. • Structure can be determined by solving a distance geometry problem with the distance data from the NMR experiments. • 15% of the structures in PDB Data Bank were determined by using NMR spectroscopy.
potential energy minimization Hypothesis Protein native structure has the lowest or almost lowest potential energy. It can therefore be located at the global energy minimum of protein.
potential energy minimization • A reasonably accurate potential energy function needs to be constructed. • Given such a function, a local minimum is easy to find, but a global one is hard, especially if the function has many local minima. No completely satisfactory algorithm has been developed yet for minimizing proteins. • Potential energy minimization has been used successfully for structure refinement though.
molecular dynamics Folding can be simulated by following the movement of the atoms in protein according to Newton’s second law of motion.
molecular dynamics • The step size has to be small in femto-second to achieve accuracy. • Current computing technology can make only picoseconds to microseconds of simulation, while protein folding may take seconds or even longer time. • Molecular dynamics simulation has been used successfully for the study of other types of dynamical behavior of protein.
limitations of MD simulations • Full atomic representation noise difficulty in discerning the dominant mechanisms of motion need for methods for filtering out the noise, such as Essential Dynamics. • Empirical force fields limited by the accuracy of the potentials. • Time steps constrained by fastest motion (vibrations in bond lengths occur in the femtoseconds (fs) time range and necessitate the use of timesteps of 1-5 fs). • Inefficient sampling of the complete space of conformations. • Limited to small proteins (100s of residues) and/or short times (subnanoseconds).
sequence structure alignment Homology Modeling Sequence to Sequence Fold Recognition Structure to Sequence Inverse Protein Folding Sequence to Structure Known Sequences / Structures Sequence Structure Alignment Ranking Sequences / Structures
sequence structure alignment • Scoring functions may not be able to distinguish between good and bad matches. • Computing the best alignment is NP-hard in general when gaps are allowed. • The results are not accurate and have only certain level of confidence.
what is biomolecular modeling? • Application of computational models to understand the structure, dynamics, and thermodynamics of biological molecules • The models must be tailored to the question at hand: Schrödinger equation is not the answer to everything! Reductionist view bound to fail! • This implies that biomolecular modeling must be both multidisciplinary and multiscale
an odd remark "Every attempt to employ mathematical methods in the study of chemical questions must be considered profoundly irrational and contrary to the spirit in chemistry. If mathematical analysis should ever hold a prominent place in chemistry - an aberration which is happily almost impossible - it would occasion a rapid and widespread degeneration of that science." A. Comte (1830)
a Nobel remark 1992 Nobel Prize in Chemistry Rudolph Marcus (Theory of Electron Transfer) 1998 Nobel Prize in Chemistry John Pople (ab initio) Walter Kohn (DFT-density functional theory)