390 likes | 553 Views
GLOBAL SENSITIVITY ANALYSIS BY RANDOM SAMPLING - HIGH DIMENSIONAL MODEL REPRESENTATION (RS-HDMR) Herschel Rabitz Department of Chemistry, Princeton University, Princeton, New Jersey 08544. HDMR Methodology. HDMR expresses a system output as a hierarchical correlated function expansion of inputs:.
E N D
GLOBAL SENSITIVITY ANALYSIS BY RANDOM SAMPLING - HIGH DIMENSIONAL MODELREPRESENTATION (RS-HDMR)Herschel RabitzDepartment of Chemistry, Princeton University,Princeton, New Jersey 08544
HDMR Methodology • HDMR expresses a system output as a hierarchical correlated function expansion of inputs:
HDMR Methodology (Contd.) • HDMR component functions are optimally defined as: • where are unconditional and conditional probability density functions:
RS (Random Sampling) – HDMR (Contd.) • RS-HDMR component functions are approximated by expansions of orthonormal polynomials • Inputs can be sampled independently and/or in a correlated fashion • Only one set of data is needed to determine all of the component functions • Statistical analysis (F-test) is used proper truncation of RS-HDMR expansion
Global Sensitivity Analysis by RS-HDMR • Individual RS-HDMR component functions have a direct statistical correlation interpretation, which permits the model output variance to be decomposed into its input contributions • Where are defined as the covariances of with f(x), respectively
A Propellant Ignition Model Calculated profiles of temperature and major mole fractions for the ignition and combustion of the M10 solid propellant
A Propellant Ignition Model • 10 independent and 44 cooperative contributions of inputs were identified as significant
A Propellant Ignition Model • Nonlinear global sensitivity indexes efficiently identified all significant contributions of inputs
Trichloroethylene (TCE) Microenvironmental/Pharmacokinetic Modeling Microenvironmental/exposure/dose modeling system Structure of TCE-PBPK model (adapted from Fisher et. al., 1998)
Example: Trichloroethylene (TCE) Microenvironmental/Pharmacokinetic Modeling • The coupled microenvironmental/pharmacokinetic model: • Three exposure routes (inhalation, ingestion, and dermal absorption) • Release of TCE from water into the air within the residence • Activities of individuals and physiological uptake processes • Seven input variables [age (x1), tap water concentration (x2), shower stall volume (x3), drinking water consumption rate (x4), shower flow rate (x5), shower time (x6), time in bathroom after shower (x7)] are used to construct the RS-HDMR orthonormal polynomials • Target outputs: the total internal doses from intake (inhalation and ingestion) and uptake (dermal absorption) • The amount inhaled or ingested: • The amount absorbed: • C(t): exposure concentration, IR(t): inhalation or ingestion rate, Kp: permeability coefficient, SA(t): surface area exposed
Trichloroethylene (TCE) Microenvironmental/Pharmacokinetic Modeling • Inputs (x1, x2, x3, x4) have a uniform distribution, and inputs (x5, x6, x7) have a triangular distribution; 10,000 input-output data were generated The data distributions for the uniformly distributed variable x1 and the triangularly distributed variable x5
Trichloroethylene (TCE) Microenvironmental/Pharmacokinetic Modeling Seven independent, fifteen 2nd order and one 3rd order cooperative contributions of inputs were identified as significant First order sensitivity indexes
Trichloroethylene (TCE) Microenvironmental/Pharmacokinetic Modeling • Nonlinear global sensitivity indexes (2nd order and above) efficiently identified all significant contributions of inputs The ten largest 2nd and 3rd order sensitivity indexes
Identification of bionetwork model parameters • Characteristics of the problem: • System nonlinearity • Limited number & type of experiments • Considerable biological and measurement noise Multiple solutions exist ! • Problems with traditional identification methods: • Provide only one or a few solutions for each parameter • Assume linear propagation from data noise to parameter uncertainties • The closed-loop identification protocol (CLIP): • Extract the full parameter distribution by global identification • Iteratively look for the most informative experiments for minimizing parameter uncertainty
General operation of CLIP Pre-lab analysis and design of the most informative experiments Iterative experiment optimization and data acquisition Global parameter identification
Isoleucyl-tRNA synthetase proofreading valyl-tRNAIle * * * * * * * Rate constants to be identified Okamoto and Savageau, Biochemistry, 23:1701-1709 (1984)
The inversion module: identifying the rate constant distribution • The Genetic Algorithm (GA) • Mutation • 1101 1111+1100 0010 • 1101 1101+1100 0110 • Crossover • 1101 1100 + 1111 0010 • 11010010+ 1111 1100 The inversion cost function Typical rate constant distribution after random perturbation/control Q Inversion quality index Q
The analysis module:estimating the most informative experiments • Estimate the best species for monitoring system behavior • Determine the best species for perturbing the system • Nonlinear sensitivity analysis by Random-Sampling High Dimensional Model Representation (RS-HDMR)
Optimally controlled identification: squeezing on the rate constant distribution • The control cost function Inversion quality Non- Feng and Rabitz, Biophys. J., 86:1270-1281 (2004) Feng, Rabitz, Turinici, and LeBris, J. Phys. Chem. A, 110:7755-7762 (2006)
Network property optimization: • Identifying the best targeted • network locations for intervention • B. Identifying the optimal network control Observed Response Biological System Learning Algorithm Control Objective Control Design Optimal Network Performance Optimal Controls Initial Guess/ Random Control
A. Molecular target identification for network engineering Random-sampling high dimensional model representation (RS-HDMR) Randomly sample k • Advantages of RS-HDMR: • Global sensitivity analysis • Nonlinear component functions • Physically meaningful representation • Favorable scalability Li, Rosenthal, and Rabitz, J. Phys. Chem. A, 105:7765-7777 (2001)
Laboratory data on the mutants k10 ─ k13 fixed k6 fixed k6 k10 ─ k13 Feng, Hooshangi, Chen, Li, Weiss, and Rabitz, Biophys. J., 87:2195-2202 (2004)
Example: Biochemical multi-component formulation mapping • Allosteric regulation of aspartate transcarbamoylase (ATcase) in vitro by all four ribonucleotide triphosphates (NTPs) • ATcase activity (output) was measured for 300 random NTP concentration combinations (inputs) in the laboratory • A second order RS-HDMR as an input -> output map was constructed. Its accuracy is comparable with the laboratory error The absolute error of repeated measurements
Biochemical multi-component formulation mapping The comparison of the laboratory data and the 2nd order RS-HDMR approximation for “used” and “test” data Note: The two parallel lines are absolute error ±0.2
IPTG IPTG aTc aTc TetR TetR LacI LacI LacI EYFP EYFP tetR tetR tetR lacI lacI lacI eyfp eyfp p(lacIq p(lacIq ) ) pL(tet pL(tet ) ) P(lac P(lac ) ) The s-space network identification procedure (SNIP) Laboratory data on the transcriptional cascade aTc: x1 IPTG: x2 EYFP: y(x1,x2) Encode: x1→x1m1(s) x2→x2m2(s) Response measurement: y→y(s) Decode: Fourier transform
Nonlinear property prediction by SNIP Nonlinear, cooperative behavior revealed Unmeasured region correctly predicted Feng, Nichols, Mitra, Hooshangi, Weiss, and Rabitz, In preparation
SNIP application to an intracellular signaling network Laboratory single cell measurement data Sachs, et al.,Science, 308:523-529 (2005)
Identified network with predictive capability Network connections identified by SNIP and Bayesian analysis Reliable SNIP prediction of Akt levels
Example: Ionospheric measured data • The ionospheric critical frequencies determined from ground-based ionosonde measurements at Huancayo, Peru from years 1957 - 1987 (8694 points) • Input: year, day, solar flux (f10.7), magnetic activity index (kp), geomagnetic field index (dst), previous day's value of foE • Output: ionospheric critical frequencies foE • The inputs are not controllable and not independent; the pdf of the inputs is not separable, and was not explicitly known
Ionospheric measured data The dependence of foE on the input “day” Ionosonde data distribution: the dependences between normalized input variables: year and f10.7, kp and dst for the data at 12 UT
Ionospheric measured data The accuracy of the 2nd order RS-HDMR expansion for the output, foE
X1 X2 Quantitative molecular property prediction Standard QSAR General strategy: Molecular activity is a function of its chemical/physical/structural descriptors • Problems: • Overfitting (choice of descriptors) • Underlying physics A simple solution: y=f(x1,x2), x1=1,2,…,N1, x2=1,2,…,N2 Descriptor-free quantitative molecular property interpolation
Descriptor-free property prediction from an arbitrary substituent order
Property prediction from the optimal substituent order Cost function: Complexity of the search: N1!•N2!=14!•8!=1015 Shenvi, Geremia, and Rabitz, J. Phys. Chem. A, 107:2066 (2003)
Application to a chromophore transition metal complex library Before reordering After reordering Cost function: Outliers captured by the reordering algorithm Liang, Feng, Lowry, and Rabitz J. Phys. Chem. B, 109:5842-5854 (2005)
Application to a drug compound library 15% of data >14,000 compounds Cost function: Reorder Prediction
THE MODERN WAY TO DO SCIENCE* * Adaptively under high duty cycle and automated • “You should understand the physics, write down • the correct equations, and let nature do the calculations.” • Peter Debye