1 / 37

HDMR Methodology

GLOBAL SENSITIVITY ANALYSIS BY RANDOM SAMPLING - HIGH DIMENSIONAL MODEL REPRESENTATION (RS-HDMR) Herschel Rabitz Department of Chemistry, Princeton University, Princeton, New Jersey 08544. HDMR Methodology. HDMR expresses a system output as a hierarchical correlated function expansion of inputs:.

ban
Download Presentation

HDMR Methodology

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. GLOBAL SENSITIVITY ANALYSIS BY RANDOM SAMPLING - HIGH DIMENSIONAL MODELREPRESENTATION (RS-HDMR)Herschel RabitzDepartment of Chemistry, Princeton University,Princeton, New Jersey 08544

  2. HDMR Methodology • HDMR expresses a system output as a hierarchical correlated function expansion of inputs:

  3. HDMR Methodology (Contd.) • HDMR component functions are optimally defined as: • where are unconditional and conditional probability density functions:

  4. RS (Random Sampling) – HDMR (Contd.) • RS-HDMR component functions are approximated by expansions of orthonormal polynomials • Inputs can be sampled independently and/or in a correlated fashion • Only one set of data is needed to determine all of the component functions • Statistical analysis (F-test) is used proper truncation of RS-HDMR expansion

  5. Global Sensitivity Analysis by RS-HDMR • Individual RS-HDMR component functions have a direct statistical correlation interpretation, which permits the model output variance to be decomposed into its input contributions • Where are defined as the covariances of with f(x), respectively

  6. A Propellant Ignition Model Calculated profiles of temperature and major mole fractions for the ignition and combustion of the M10 solid propellant

  7. A Propellant Ignition Model • 10 independent and 44 cooperative contributions of inputs were identified as significant

  8. A Propellant Ignition Model • Nonlinear global sensitivity indexes efficiently identified all significant contributions of inputs

  9. Trichloroethylene (TCE) Microenvironmental/Pharmacokinetic Modeling Microenvironmental/exposure/dose modeling system Structure of TCE-PBPK model (adapted from Fisher et. al., 1998)

  10. Example: Trichloroethylene (TCE) Microenvironmental/Pharmacokinetic Modeling • The coupled microenvironmental/pharmacokinetic model: • Three exposure routes (inhalation, ingestion, and dermal absorption) • Release of TCE from water into the air within the residence • Activities of individuals and physiological uptake processes • Seven input variables [age (x1), tap water concentration (x2), shower stall volume (x3), drinking water consumption rate (x4), shower flow rate (x5), shower time (x6), time in bathroom after shower (x7)] are used to construct the RS-HDMR orthonormal polynomials • Target outputs: the total internal doses from intake (inhalation and ingestion) and uptake (dermal absorption) • The amount inhaled or ingested: • The amount absorbed: • C(t): exposure concentration, IR(t): inhalation or ingestion rate, Kp: permeability coefficient, SA(t): surface area exposed

  11. Trichloroethylene (TCE) Microenvironmental/Pharmacokinetic Modeling • Inputs (x1, x2, x3, x4) have a uniform distribution, and inputs (x5, x6, x7) have a triangular distribution; 10,000 input-output data were generated The data distributions for the uniformly distributed variable x1 and the triangularly distributed variable x5

  12. Trichloroethylene (TCE) Microenvironmental/Pharmacokinetic Modeling Seven independent, fifteen 2nd order and one 3rd order cooperative contributions of inputs were identified as significant First order sensitivity indexes

  13. Trichloroethylene (TCE) Microenvironmental/Pharmacokinetic Modeling • Nonlinear global sensitivity indexes (2nd order and above) efficiently identified all significant contributions of inputs The ten largest 2nd and 3rd order sensitivity indexes

  14. Identification of bionetwork model parameters • Characteristics of the problem: • System nonlinearity • Limited number & type of experiments • Considerable biological and measurement noise Multiple solutions exist ! • Problems with traditional identification methods: • Provide only one or a few solutions for each parameter • Assume linear propagation from data noise to parameter uncertainties • The closed-loop identification protocol (CLIP): • Extract the full parameter distribution by global identification • Iteratively look for the most informative experiments for minimizing parameter uncertainty

  15. General operation of CLIP Pre-lab analysis and design of the most informative experiments Iterative experiment optimization and data acquisition Global parameter identification

  16. Isoleucyl-tRNA synthetase proofreading valyl-tRNAIle * * * * * * * Rate constants to be identified Okamoto and Savageau, Biochemistry, 23:1701-1709 (1984)

  17. The inversion module: identifying the rate constant distribution • The Genetic Algorithm (GA) • Mutation • 1101 1111+1100 0010 • 1101 1101+1100 0110 • Crossover • 1101 1100 + 1111 0010 • 11010010+ 1111 1100 The inversion cost function Typical rate constant distribution after random perturbation/control Q Inversion quality index Q

  18. The analysis module:estimating the most informative experiments • Estimate the best species for monitoring system behavior • Determine the best species for perturbing the system • Nonlinear sensitivity analysis by Random-Sampling High Dimensional Model Representation (RS-HDMR)

  19. Optimally controlled identification: squeezing on the rate constant distribution • The control cost function Inversion quality Non- Feng and Rabitz, Biophys. J., 86:1270-1281 (2004) Feng, Rabitz, Turinici, and LeBris, J. Phys. Chem. A, 110:7755-7762 (2006)

  20. Network property optimization: • Identifying the best targeted • network locations for intervention • B. Identifying the optimal network control Observed Response Biological System Learning Algorithm Control Objective Control Design Optimal Network Performance Optimal Controls Initial Guess/ Random Control

  21. A. Molecular target identification for network engineering Random-sampling high dimensional model representation (RS-HDMR) Randomly sample k • Advantages of RS-HDMR: • Global sensitivity analysis • Nonlinear component functions • Physically meaningful representation • Favorable scalability Li, Rosenthal, and Rabitz, J. Phys. Chem. A, 105:7765-7777 (2001)

  22. Laboratory data on the mutants k10 ─ k13 fixed k6 fixed k6 k10 ─ k13 Feng, Hooshangi, Chen, Li, Weiss, and Rabitz, Biophys. J., 87:2195-2202 (2004)

  23. Example: Biochemical multi-component formulation mapping • Allosteric regulation of aspartate transcarbamoylase (ATcase) in vitro by all four ribonucleotide triphosphates (NTPs) • ATcase activity (output) was measured for 300 random NTP concentration combinations (inputs) in the laboratory • A second order RS-HDMR as an input -> output map was constructed. Its accuracy is comparable with the laboratory error The absolute error of repeated measurements

  24. Biochemical multi-component formulation mapping The comparison of the laboratory data and the 2nd order RS-HDMR approximation for “used” and “test” data Note: The two parallel lines are absolute error ±0.2

  25. IPTG IPTG aTc aTc TetR TetR LacI LacI LacI EYFP EYFP tetR tetR tetR lacI lacI lacI eyfp eyfp p(lacIq p(lacIq ) ) pL(tet pL(tet ) ) P(lac P(lac ) ) The s-space network identification procedure (SNIP) Laboratory data on the transcriptional cascade aTc: x1 IPTG: x2 EYFP: y(x1,x2) Encode: x1→x1m1(s) x2→x2m2(s) Response measurement: y→y(s) Decode: Fourier transform

  26. Nonlinear property prediction by SNIP Nonlinear, cooperative behavior revealed Unmeasured region correctly predicted Feng, Nichols, Mitra, Hooshangi, Weiss, and Rabitz, In preparation

  27. SNIP application to an intracellular signaling network Laboratory single cell measurement data Sachs, et al.,Science, 308:523-529 (2005)

  28. Identified network with predictive capability Network connections identified by SNIP and Bayesian analysis Reliable SNIP prediction of Akt levels

  29. Example: Ionospheric measured data • The ionospheric critical frequencies determined from ground-based ionosonde measurements at Huancayo, Peru from years 1957 - 1987 (8694 points) • Input: year, day, solar flux (f10.7), magnetic activity index (kp), geomagnetic field index (dst), previous day's value of foE • Output: ionospheric critical frequencies foE • The inputs are not controllable and not independent; the pdf of the inputs is not separable, and was not explicitly known

  30. Ionospheric measured data The dependence of foE on the input “day” Ionosonde data distribution: the dependences between normalized input variables: year and f10.7, kp and dst for the data at 12 UT

  31. Ionospheric measured data The accuracy of the 2nd order RS-HDMR expansion for the output, foE

  32. X1 X2 Quantitative molecular property prediction Standard QSAR General strategy: Molecular activity is a function of its chemical/physical/structural descriptors • Problems: • Overfitting (choice of descriptors) • Underlying physics A simple solution: y=f(x1,x2), x1=1,2,…,N1, x2=1,2,…,N2 Descriptor-free quantitative molecular property interpolation

  33. Descriptor-free property prediction from an arbitrary substituent order

  34. Property prediction from the optimal substituent order Cost function: Complexity of the search: N1!•N2!=14!•8!=1015 Shenvi, Geremia, and Rabitz, J. Phys. Chem. A, 107:2066 (2003)

  35. Application to a chromophore transition metal complex library Before reordering After reordering Cost function: Outliers captured by the reordering algorithm Liang, Feng, Lowry, and Rabitz J. Phys. Chem. B, 109:5842-5854 (2005)

  36. Application to a drug compound library 15% of data >14,000 compounds Cost function: Reorder Prediction

  37. THE MODERN WAY TO DO SCIENCE* * Adaptively under high duty cycle and automated • “You should understand the physics, write down • the correct equations, and let nature do the calculations.” • Peter Debye

More Related