Article
Open access
Published: 02 July 2024

Identifying Bayesian optimal experiments for uncertain biochemical pathway models

Scientific Reports volume 14, Article number: 15237 (2024) Cite this article

189 Accesses
Metrics details

Subjects

Abstract

Pharmacodynamic (PD) models are mathematical models of cellular reaction networks that include drug mechanisms of action. These models are useful for studying predictive therapeutic outcomes of novel drug therapies in silico. However, PD models are known to possess significant uncertainty with respect to constituent parameter data, leading to uncertainty in the model predictions. Furthermore, experimental data to calibrate these models is often limited or unavailable for novel pathways. In this study, we present a Bayesian optimal experimental design approach for improving PD model prediction accuracy. We then apply our method using simulated experimental data to account for uncertainty in hypothetical laboratory measurements. This leads to a probabilistic prediction of drug performance and a quantitative measure of which prospective laboratory experiment will optimally reduce prediction uncertainty in the PD model. The methods proposed here provide a way forward for uncertainty quantification and guided experimental design for models of novel biological pathways.

CurveCurator: a recalibrated F-statistic to assess, classify, and explore significance of dose–response curves

Article Open access 30 November 2023

Using predictive machine learning models for drug response simulation by calibrating patient-specific pathway signatures

Article Open access 27 October 2021

Rationalizing general limitations in assessing and comparing methods for compound potency prediction

Article Open access 19 October 2023

Introduction

The intricate and lengthy process of discovering novel drugs stands to gain significant advantages through the application of computational methods¹. These methods provide efficient and cost-effective avenues for exploring expansive chemical spaces, predicting molecular properties, and optimizing potential drug candidates. Dynamic models of biological systems (e.g., pharmacokinetic (PK) or pharmacodynamic (PD) models) are useful tools for studying biomolecular processes. Such models have been applied to study inter- and intra-cellular phenomena, including regulatory, metabolic, and signalling processes within human cells^2,3,4. These models are derived from first-principles approximations of the complex dynamics that occur in vivo. The utility of these biological models lies in their simplicity; tractably capturing qualitative system behaviors at the expense of prediction accuracy. Although these dynamic models are often used in preclinical drug development to determine in vivo drug response⁵, they are rarely applied in drug discovery. Biological pharmacodynamic models are valuable tools in both the optimal design and validation phases of identifying novel drug candidates, making them crucial for computational drug discovery. Furthermore, regulatory agencies are beginning to accept in silico studies as part of the validation and testing of new medical therapies and technologies^6,7. This underscores a need to formalise approaches for uncertainty quantification and model improvement for pharmacodynamic models.

Uncertainty quantification is a critical analysis that is often overlooked in computational drug design, despite its relevance and available software tools⁸. Previous studies have applied methods of uncertainty quantification to models for calibrating algorithmic parameters in automated diabetes treatment systems⁹, a PK/PD cancer model for the antivascular endothelial growth factor in the present of a cancer therapeutic agent¹⁰, the calibration of dose-response relationships for the $\gamma$-H2AX assay to predict radiation exposure¹¹, and peptide-MHC affinity prediction using a data-driven model¹². In the aforementioned studies, experimental or clinical data was readily available for the proposed uncertainty quantification procedures. For many PD models, the scarcity of quantitative data for calibrating these models has led to significant challenges in uncertainty quantification. This limited availability of experimental data results in non-unique and/or unconstrained parameter estimations, leading to issues of nonidentifiability¹³. Researchers have acknowledged the presence of significant parameter correlations, nonidentifiability, and parameter sloppiness in dynamic biology models, where parameters can vary over orders of magnitude without significantly affecting model output¹⁴. Given these limitations, it is evident that incorporating uncertainty quantification to biological dynamic models, even in the absence of available data, is crucial if they are to be used for computational drug discovery and evaluation.

Optimal experimental design (OED) has been proposed as a method for improving model parameter estimates in biological pharmacodynamic models. In OED, existing measured data is used to determine what new experiment(s) should be done to best reduce parameter uncertainties¹⁵. Several works have applied different flavors of optimal experimental design to dynamic biological models. Work done by Liepe et al.¹⁶ demonstrates OED to maximize expected information gained in a signaling pathway model featuring 6 differential equations, 4 uncertain parameters, and selecting between 5 experimental outcomes. Researchers in Bandara et al.¹⁷ showed that a sequential experimental design for a cell signaling model, coupled with fluorescence microscopy-based experimental data, was able to reduce uncertainty in some parameters, while it increased in others due to nonidentifiability. This work also provides an example of selecting optimal experiments based on an experimentally relevant metric. This approach is in opposition to the more common purely information-theoretic approach. We note that the model used in this study is relatively small, only featuring 4 differential equations, 4 uncertain rate constants, and selecting between 3 experimental possibilities. Researchers in Eriksson et al.¹⁸ utilized approximate Bayesian computation to identify approximate posterior distributions for uncertain model parameters. This work presents a larger scale problem, with 34 reactions for 25 species, 34 uncertain parameters, and 6 experimental possibilities from which to select. This approximate approach is applicable to scenarios where the the data-generation distribution (likelihood function) is not available in analytic form. There are other methods for informed experimental design in systems biology that have utilized in the literature¹⁹. However, unlike Bayesian data inference approaches, these methods do not incorporate prior knowledge (i.e., expert biological insights) into the inference process. They also do not provide a probabilistic result, but rather a point estimate for model parameters.

In this paper, we utilize methods from Bayesian optimal experimental design (BOED) using exact Bayesian inference and Hamiltonian Monte Carlo (HMC) to recommend experiments for mechanistic model improvement. This framework allows for quantitative decision-making within the classical “experiment-theory-experiment” loop, wherein experiments are used to first deduce theoretical relationships and then to validate the theory. The goal is to identify which experiment, or sequence of experiments, will provide the most “useful” data in either constructing or validating the theory. The definition of “useful” may differ depending on the goals of the practitioner, but are often related to a measure of model uncertainty reduction. The focus of the present work is on a dynamic model of programmed cell death, i.e., apoptosis. This model predicts synthetic lethality in cancer in the presence of a PARP1 inhibitor. Therefore, when we refer to selecting an optimal experimental design, we mean “which species should we measure from a PARP1-inhibited cell experiment to improve confidence in simulated lethality predictions of a given inhibitor?” The goal is to first understand the impact of parameter uncertainty on the predictive reliability of this model. Then, the aim is to recommend experimental measurements that can be done in the lab to maximize confidence in simulated lethality predictions.

To do this, we conduct parameter inference using HMC for large volumes of data generated from simulated experiments. This way, we can determine which experimental data, in expectation, would most reduce uncertainty in modeled drug performance predictions. Such an approach for using synthetic data to model potential outcomes is particularly useful when there are too many potential experiments and replicates to conduct within time and resource constraints. Employing the proposed BOED approach provides researchers with a quantitative method for linking physical experiment design to model improvement in the absence of a necessary abundance of existing data.

It is important to note that while our optimal experimental design study is conducted with simulated experimental data, and the results are thus subject to our modeling assumptions, the dynamic model underpinning our analysis contains biologically relevant and measurable parameters, thus offering insights that can guide future experimental endeavors. Furthermore, the framework we lay out here is general and applicable to any relevant biological modeling setting where experimental data is obtainable. To ensure that the recommended experiments made in BOED are actionable, it is essential that practitioners develop theoretical models that only include feasible experimental designs. Nevertheless, the optimization might occasionally recommend currently infeasible experiments, which can spur the development of new experimental methods to enable these designs. For the current application, our optimization specifically recommends experiments involving species that can be measured in the lab with existing methods^20,21.

Formally, the contributions of this work are threefold: (1) Applying Bayesian optimal experimental design and high-performance computing (HPC) to a large-scale coupled ODE system of 23 equations and with 11 uncertain parameters and choosing between 5 experimental designs — the largest PD system considered thus far in a fully Bayesian OED framework of which the authors are aware; (2) Proposing novel decision-relevant metrics for quantifying the uncertainty in therapeutic performance of novel inhibitor drugs at a given concentration ($\text {IC}_{\text {50 }}$); and (3) Identifying optimal experiments that minimize the uncertainty in therapeutic performance (i.e., simulated lethality) as a function of the inhibitor dosage of interest.

As we will show, there is preference for measuring activated caspases at low $\text {IC}_{\text {50 }}$to reduce uncertainty in the probability of cell death. At larger $\text {IC}_{\text {50 }}$, our results show that this uncertainty is maximally reduced by collecting data the mRNA-Bax concentrations. Therefore, we conclude that the decision as to which species to experimentally measure must be selected by considering trade-offs between the predicted BOED objectives and the inhibitor viability in vitro. The approach and results presented here thus bridge model-guided optimal design with uncertainty quantification, advancing drug discovery through uncertainty-aware decision-making.

Methods

The problem addressed in this work is to understand the impact of model parameter uncertainty on the predictive capacity of the PD model for PARP1-inhibited cellular apoptosis; a representative model for cancer drug evaluation. And given the effects of uncertainty and lack of experimental data, we wish reestablish a predictive model by recommending measurements that can be made in the lab to better constrain the model parameters.

The workflow to achieve the stated goals is as follows:

1.
Acquire many (synthetic) experimental measurements for each prospective measurable species in the model
2.
Construct prior probability distributions (i.e., beliefs regarding model parameter distributions in the absence of data)
3.
Conduct parameter estimation via Bayesian inference using the model, data, and prior probabilities for each prospective experiment and over multiple data samples
4.
Compute expected drug performance predicted by the model given posterior probability distributions (i.e., updated parameter distribution beliefs after incorporating data)
5.
Rank and recommend experiments based on a metric that quantifies reliability in model predictions.

A graphical depiction comparing traditional parameter calibration to the proposed approach using simulated experimental data is shown in Fig. 1.

Traditional model parameter calibration via Bayesian inference is shown in Fig. 1a. The model-derived prior predictive distribution of concentration profiles for Species “A” is constrained by experimental data (in black). Here, we only show the final time point being used to constrain the time-series lab measurements. In general, the collected laboratory time-series data could be used jointly for Bayesian inference. Conducting inference produces an updated probability density function for the model parameters $\theta$ referred to as the posterior distribution. The posterior distribution is then used to quantify inhibitor performance via probability of achieving apoptosis in simulation across an array of $\text {IC}_{\text {50 }}$values. We can then compare the expected inhibitor drug performance between the prior and posterior uncertainty. More accurate predictions by the posterior are thus expected because real data was used to update the model.

The methodology utilized in the present work is presented in Fig. 1b. Here, we do not have access to real, measured data to constrain our model. However, we can use the PD model and an expert-derived error model to generate an ensemble of simulated experimental data. In simulating and calibrating to a large volume of simulated experimental data, we account for uncertainty in what could be measured in reality (in black). This then leads to an ensemble of posterior distribution predictions for the parameters $\theta$. We obtain ensembles of posterior parameter distributions for several potential measurable species in the model (e.g., Species “B” and Species “C”). From these, we can compute an ensemble of inhibitor performances as measured by probability of apoptosis across a range of $\text {IC}_{\text {50 }}$values. This implies a distribution on expected inhibitor performances as predicted by “measuring” different species and using the simulated measurements for calibration. By comparing the distributions in expected inhibitor performance across different measured species, we can identify which experiment is “optimal” by selecting the one that leads to the greatest reduction in uncertainty.

Mathematical model

The general form of a pharmakodynamic model is shown in Eq. (1), which represents a set of coupled ordinary differential equations.

$$\begin{aligned} \begin{aligned} \displaystyle \dot{y}(t)&= f\left( y(t); u, \kappa \right) \\ y(t_0)&= y_0 \end{aligned} \end{aligned}$$

(1)

In the rate-based modeling framework used here, variables $y(t) \in \mathbb {R}^n$ are state variables representing the time-varying concentrations of the n chemical species of interest within the cell signalling pathway. The parameters $\kappa \in \mathbb {R}^m$ are the parameter data, which includes the kinetic rate constants for the reaction network. Initial concentrations of species are those at $t_0$, specified by the value of the vector ${y}_0 \in \mathbb {R}^n$. The system parameters and time-varying species concentrations are related by the functions $f : \mathbb {R}^n \rightarrow \mathbb {R}$, indexed by $\forall i \in \{1,\dots ,n\}$. Also, we separate the input parameters in Eq. (1) into two vectors: u, the control variables, and $\kappa$ the uncertain model parameters. The pre-specified input parameters u are any experimental controls set a priori. In the model considered here, this is the half maximal inhibitory concentration ($\text {IC}_{\text {50 }}$) for a novel PARP1 inhibitor molecule. $\text {IC}_{\text {50 }}$is a commonly used metric in experimental biology for quantifying a molecule’s ability to inhibit a specific biological function.

Bayesian optimal experimental design

Bayesian optimal experimental design is a statistical framework that aims to identify the most informative experimental conditions or measurements to improve parameter estimation and model predictions in the presence of uncertainty^22,23. It involves iteratively selecting experiments^24,25 that maximize the expected information gain or other objectives, such as mean objective cost of uncertainty^26,27, given prior knowledge, model uncertainty, and the available resources. The goal is to strategically design experiments that provide the most valuable information via uncertainty reduction in the calibrated model predictions.

In our application, we want to identify an experimental design $\xi$ (i.e., a specific species we can measure from the cell in the presence of a PARP1 inhibitor) from the set of feasible designs, $\Xi$, which optimizes information gain regarding uncertain parameters, $\theta$, given the outcome of that experiment, y. Moving forward, we assumed the set of experiments we search over $\xi \in \Xi$ are all equally easy to conduct in the laboratory. In a more general case, additional constraints can be incorporated to this approach to prioritize ease of implementation. Let us denote the set of all uncertain parameters within the ODE system, including initial conditions and rate constants, as the set $\Theta$. For tractability reasons, we may choose to only consider a subset of the parameters which are deemed most significant via global sensitivity analysis and refer to the vector of key, significant parameters as $\theta \in \Theta$. We also denote the vector of experimental designs being considered as $\xi \in \mathcal{X} \subset \Xi$, and $|\mathcal{X}| = N_{\xi }$ is the number of design decisions to choose from in optimization. Here, the individual experiments represent a species to measure in an experiment representing the process modeled in the PARP1-inhibited cell apoptosis PD model. We assume prior distributions on the uncertain parameters $p(\theta )$, and a likelihood of the data conditional on the parameters and controls variables, $p(y|\theta , \xi , u)$. For a general objective function of interest, $J(\xi ; u)$, where $\xi$ are the decision variables and u inputs to Eq. (1), the Bayesian optimal experimental design task is described by Eq. (2).

$$\begin{aligned} \xi ^{*} = \arg \max _{\xi \in \Xi }J(\xi ;u) \end{aligned}$$

(2)

Traditionally, this objective ${J}$ is the expected information gain (EIG), a quantitative measure of entropy reduction between the prior and posterior probability distributions of $\theta$^28,29. Conceptually, EIG is used to measure how much useful knowledge we expect to gain from an experiment before actually performing it. EIG computes the difference in the entropy, or randomness, of a system before and after obtaining new data. This metric helps in choosing experiments that are predicted to be the most informative, based on current knowledge. Essentially, it guides decision-making by prioritizing experiments that maximize the clarity or accuracy of the model’s outputs, thereby making the research process more efficient and targeted. Under the EIG formulation, computing the objective function of the optimal experimental design problem becomes a nested expectation computation under the marginal distribution of experimental outcomes. The posterior distributions are often then estimated using a Markov Chain Monte Carlo (MCMC) algorithm to compute the information gained. For this application, we are not interested in optimal experiments which simply minimize relative entropy in the prior and posterior distributions in $\theta$, as is the standard practice in BOED. Instead, we wish to select optimal experimental designs that minimize uncertainty in lethality predictions made by the ODE model under different inhibitor drug $\text {IC}_{\text {50 }}$, i.e., different $u \in \mathcal{U}$. In this model setting, lethality (i.e., probability of cell apoptosis) can be viewed as a probabilistic analog to the traditional dose-response curve and is defined in Eq. (3).

To that end, we devise two metrics for quantifying uncertainty in cell lethality predictions at different $\text {IC}_{\text {50 }}$values. The first is the uncertainty in the probability of triggering cell apoptosis at a given $\text {IC}_{\text {50 }}$, represented as $\sigma _{apop}(u)$. This quantity defines a standard deviation on the predicted probability of apoptosis at varying levels of the inhibitor concentration, denoted as $u = IC_{50}$. One can see in Fig. 2a that $\sigma _{apop}(u)$ quantifies how certain or uncertain we are about the model’s prediction at each concentration of inhibitor. A lower $\sigma _{apop}(u)$ at a given IC indicates a higher confidence in the model’s prediction of apoptosis probability at that concentration. This allows researchers to identify the concentrations at which the model’s predictions regarding whether the cells will live or not are most reliable. One may choose to only consider the $\sigma _{apop}$ at very small $\text {IC}_{\text {50 }}$, since we are interested in drug molecules which are most effective at low doses.

The second metric quantifies the uncertainty in determining the IC$_{50}$ (i.e., the concentration of the inhibitor) that achieves a specified probability, $\mathbb {T}$, of triggering apoptosis. Figure 2b shows a graph of $\sigma _{IC_{50}}(\mathbb {T})$ as a function of IC$_{50}$ for a fixed $\mathbb {T}$ (e.g., $\mathbb {T}$ = 0.75, meaning 75% probability of apoptosis) and illustrates the variability or width of the IC$_{50}$ estimate. A smaller $\sigma _{IC_{50}}(\mathbb {T})$ at a given $\mathbb {T}$ indicates more precision in identifying the IC$_{50}$ required to reach the desired probability of apoptosis.

In essence, each of these metrics serve to quantitatively assess the robustness and reliability of the model in predicting effective doses and expected cellular responses. By quantifying the uncertainties, we can better understand the limitations of our model and identify the ranges of inhibitor concentrations where predictions are most and least certain. More detail on these metrics is discussed in Appendix “Uncertainty metrics”. In addition, we discuss the multi-objective treatment of these two metrics, and the sensitivity of the optimal recommended experiment to that treatment, in section “Results and discussion”.

Application

We consider a dynamic PARP1-inhibited cell apoptosis models for both wild-type human cells published in³⁰. This model represents the programmed cell death pathway, particularly in cancer cells exhibiting DNA double strand breaks (DSBs). The fate of these cells hinges on two key processes: (1) Caspase-mediated cell death or (2) DNA repair. In case (1) The activation of caspases, which are proteases, are known to lead to apoptosis if they reach a threshold concentration value³¹, effectively eliminating cells with damaged DNA. This, in turn, slows the proliferation of cancerous cells. In case (2), cells have the capability to repair DSBs, potentially surviving despite the initial damage. Importantly, our model includes the interaction of these pathways with a PARP1 inhibitor. PARP1 (Poly (ADP-ribose) polymerase 1) is an enzyme crucial for the repair process of DSBs. By inhibiting PARP1, our model explores the potential to prevent the repair of DSBs in cancerous cells, thus promoting apoptosis through unresolved DNA damage. In the context of our optimal experimental design, we aim to investigate the effects of varying levels of PARP1 inhibition on the balance between cell death and DNA repair. The goal is to identify experimental conditions that maximize our confidence in how effectively PARP1 inhibitors shift this balance towards apoptosis in cancer cells, thereby providing insights into potential therapeutic strategies. Thus, for the present study, the process that this model represents is the inhibition of PARP1 as a therapy for cancer. By applying an inhibitor via a specified $\text {IC}_{\text {50 }}$, inhibition of PARP1 halts the DNA strand break repair process, reducing damage repair pathways and potentially leading to cell apoptosis. Further information on the model and the complete system of differential equations is available in Eq. (S1).

The model consists of a set of 23 ordinary differential equations (ODE) wherein a subset of the 27 rate constant parameters are considered uncertain. This model takes as inputs (1) an $\text {IC}_{\text {50 }}$value and (2) initial conditions and (3) kinetic parameter values. In the present study, we only consider output at a specific time point, (i.e., the final time point) when performing inference. This means that the likelihood we supply for the data follows a Normal distribution in the model prediction at $t=\tau$, where $\tau$ is the time limit for the numerical integrator used to solve the ODE system at a given $\text {IC}_{\text {50 }}$, and with a known $\sigma$ that is computed with a $10\%$ error relative to the model prediction. Information for the uncertain parameters considered in this study was first elicited from expert understanding of the system and relevant sources. Nominal parameter values and confidence interval information are shown in Table S1. Details regarding the construction of all prior distributions on uncertain parameters considered in the present study are provided in Table S2. In addition, because experimental data for species concentrations within a single cell undergoing PARP1-inhibited apoptosis are not readily available, we utilize simulated experimental data for our study that is generated via the procedure described in Appendix section “Simulated experimental data”.

Objective function specification

In this study, we prioritize a goal-oriented objective function based on model prediction accuracy, considering this as the primary aim of our experimental design. This approach is chosen over metrics like expected information gain (EIG), which, while effective in reducing parameter uncertainty, do not necessarily align with our core objective of enhancing the reliability and accuracy of PD model predictions. As noted previously, the quantity-of-interest in assessing the performance of PARP1 inhibitor is the predicted lethality. The predicted lethalities are then used to compute our uncertainty metrics or objective functions, $\sigma _{apop}(u)$ and $\sigma _{IC_{50}}(\mathbb {T})$. Here, we specify the method for computing lethality, i.e., probability of triggering cell death. It has been shown that a activated caspase molecule count in the cell above a threshold of 1,000 is sufficient to trigger cell apoptosis¹⁶. Therefore, we can compute whether or not cell apoptosis has been irreversibly triggered by the introduction of a PARP1 inhibitor by computing $y_{k}^{max} := \displaystyle \max _{t \in [1,\dots ,\tau ]} y_{k}(t)$ where $k=$Casp-act. Thus, lethality, or P(apoptosis) can be computed by sampling the prior (posterior) probability distributions of the uncertain parameters, $\theta ^i \sim p(\theta ) \ \forall i \in [1,\dots , N_s]$, specifying a value for $\text {IC}_{\text {50 }}$, and integrating the system of ODEs. Given the ensemble of $N_s$ maximum-caspase concentrations, P(apoptosis) can be calculated as shown in Eq. (3).

$$\begin{aligned}&P(apoptosis) := \frac{1}{N_{s}} \sum _{i=1}^{N_s} \mathbbm {1}_{y_{Casp,i}^{max} \ge 1000} \end{aligned}$$

(3)

Here, $N_s$ is the Monte Carlo sample count. When computed at all $\text {IC}_{\text {50 }}$of interest, we retrieve a vector of P(apoptosis|u) predictions $\forall u \in \mathcal {U}$, producing a lethality curve for a given posterior parameter estimate.

Incorporating biological constraints

In addition to the expert-elicited prior probability information in Table S1, there is other biologically and therapeutically relevant information that we can incorporate into Bayesian parameter estimation to further constrain our posterior predictions. First, we know that for PARP1 inhibitor drugs with sufficiently high $\text {IC}_{\text {50 }}$values (i.e., low efficacy), we expect the repair signal from PARP1 to win out over the programmed cell death induced by p53. This means that at high $\text {IC}_{\text {50 }}$, the DNA double strand break will be repaired and the cell will live. We also know the opposite to be true: for low $\text {IC}_{\text {50 }}$, programmed cell death will be induced and the cell will die. In mathematical terms, we can say:

$$\begin{aligned} P(\text {apoptosis}) = {\left\{ \begin{array}{ll} 0,&{}\text { if }\text {IC}_{\text {50 }} \ge 100\\ 1, &{}\text { if }\text {IC}_{\text {50 }} \le 0.001\\ \text {Equation}~3 &{} \text {otherwise} \end{array}\right. } \end{aligned}$$

(4)

This prior knowledge regarding the effect of a drug on our quantity of interest does not directly translate to knowledge regarding the uncertain model parameters. However, because the quantity of interest is computed directly from an output of the model (i.e., concentration of activated caspases) we can still include this prior knowledge in Bayesian inference in three equivalent ways: in the likelihood, the prior, or through post facto filtering of the estimated posterior chain. For the present application, we choose the posterior filtering approach due to numerical challenges in the HMC sampling algorithm used to estimate the posterior distribution.

The posterior filtering approach used involves sampling a larger chain (in this study, 50,000 samples) for each simulated experimental data point. If a given parameter sample in the posterior chain leads to the violation of the constraints in Eq. (4) when specified in the forward simulation model under the nominal $\text {IC}_{\text {50 }}$(i.e., Talazoparib, see Appendix section “Simulated experimental data”), that sample is removed from the chain.

Results and discussion

In this section, we provide the key findings of our study applying Bayesian optimal experimental design to the PARP1-inhibited cell apoptosis model. First, we conduct a prior sensitivity analysis, follow by posterior estimation and computation of our optimal experimental design objectives.

Prior sensitivity

We conducted a prior sensitivity analysis computed for a quantity of interest, $P(\text {apoptosis})$. The purpose of this study is to understand how much reduction in uncertainty is required to achieve desired lethality curve predictions. The result of this study is shown in Fig. 3.

Here, the $P(\text {apoptosis})$ is computed by (1) drawing 1,000 random samples from prior probability distributions for each uncertain parameter, then (2) solving the model in Eq. (S1) at those parameter sample values and (3) computing $P(\text {apoptosis})$ via Eq. (3) for all $\text {IC}_{\text {50 }}$= [0.001, 0.01, 0.032, 0.1, 0.32, 1.0, 3.2, 10.0, 100.0]. To determine the model sensitivity against the priors, we scale $\sigma$ for each parameter by a multiplicative factor to narrow the support of the prior distribution around its nominal mean. In Fig. 3a, we neglect our prior constraints and just solve the model under the unconstrained prior samples. Under these conditions, we see that given the nominal prior uncertainty (i.e., $1.0\sigma )$, the effect of $\text {IC}_{\text {50 }}$on lethality vanishes (i.e., it is a more-or-less flat line). From the results, it appears that the model requires at minimum a reduction in prior uncertainty of two orders of magnitude (i.e., $0.01\sigma$) before a predictive model is restored via sigmoidal curvature in the prior lethality curve. This implies that achieving a reduction in uncertainty spanning two orders of magnitude across all uncertain parameters is imperative to differentiate therapeutic performance between different inhibitors using the PD model.

In Fig. 3b, we conduct the same sensitivity analysis while also applying the constraints in Eq. (4) to restrict the model-predicted lethality values using biological insights. Under these conditions, we obtain lethality curve predictions that have the desired sigmoidal shape, showing low therapeutic performance at large $\text {IC}_{\text {50 }}$and high therapeutic performance at small $\text {IC}_{\text {50 }}$. However, it is also evident that significant reduction in parameter uncertainty (0.0001$\sigma$) is required to achieve apoptotic switch behavior at low $\text {IC}_{\text {50 }}$¹⁶.

It is worth noting that this sensitivity analysis involved altering all prior $\sigma$ values by the same amount, and the existence of parameter correlations could potentially yield alternative outcomes in terms of the necessary uncertainty reduction. However. these results underscores the need for substantial reduction in parameter uncertainty in PD models to re-establish predictive ability.

Posterior estimation

We estimate posterior probability distributions for the uncertain parameters using the proposed Bayesian optimal experimental design approach. Details regarding the underlying algorithm are provided in Appendix section “Algorithmic approach”. The summary statistics for (1) mean posterior parameter estimates, (2) effective sample size per parameter, and (3) filtered chain length are shown in Tables S7–S16. The filtering procedure leads to non-uniform posterior chain lengths and lower effective sample sizes, as shown in the summary statistics. We note that only approximately 1% of samples in the posterior chains pass the constraints enforced via Eq. (4). This is indicative of a known challenge in applying feasibility constraints after sampling: that the vast majority of the parameter space explored is ultimately infeasible, even for sufficiently large chains. Therefore, in future work, we aim to explore methods such as Bayesian constraint relaxation³² to enforce relaxed constraints on our probabilistic model while maintaining numerical stability in standard Hamiltonian Monte Carlo sampling algorithms. Still, the average ESS per parameter is still close to the average chain length for each species, implying a high number of non-correlated samples in the posterior chains we computed. Additionally, we note that the assumption of Log-Normal prior probability distributions for all the model parameters is an assumption with ramifications on the final derived posterior distributions.

Value of information estimation

The ensemble of predicted lethality curves for each experimental design considered (i.e., species) are shown in Fig. 4. It is clear from the plots that certain species induce less spread in lethality predictions across $\text {IC}_{\text {50 }}$(e.g., mRNA-Bax) while other exhibit more variability (e.g., Casp-pro). To quantitatively understand these trends, we compute the the values of $\sigma _{apop}(u)$ and $\sigma _{IC_{50}}(\mathbb {T})$. The plots in Fig. 5a,b show how these uncertainty metrics evolve over u and $\mathbb {T}$, respectively. Tabulated results are available in Tables S4 and S5.

It is seen in Fig. 5a that there is a regime at small $\text {IC}_{\text {50 }}$(which is the regime with the greatest therapeutic viability) wherein the minimizer of $\sigma _{apop}$ changes. In particular, at $\text {IC}_{\text {50 }}$= 0.01 and $\text {IC}_{\text {50 }}$= 0.03, the $\xi$ which minimizes the $\sigma _{apop}$ objective is Casp-act. At $\text {IC}_{\text {50 }}$> 0.03, the minimizer becomes mRNA-Bax for the rest of the $\text {IC}_{\text {50 }}$range considered. This can also be seen in Table S4, where the minimum value of $\sigma _{apop}$ in each column (i.e., for each $\text {IC}_{\text {50 }}$) is demarcated with an $*$.

When considering the $\sigma _{apop}$ objective alone, and when only placing significance in reducing uncertainty at low $\text {IC}_{\text {50 }}$, the optimal decision $\xi ^{*}$ = Casp-act. At $\text {IC}_{\text {50 }}$= 0.01, the potential reduction in uncertainty in P(apoptosis) under experimental designs (i.e., measurements) $\xi ^{*} =$ Casp-act is 24% when compared to Bad-Bcl-xL, the species with the highest $\sigma _{apop}(0.01)$. We note again that reduction in $\sigma _{apop}$ directly translates to a reduction in the uncertainty regarding the predicted P(apoptosis) for a given inhibitor (i.e., at a given $\text {IC}_{\text {50 }}$). This is analogous to reducing uncertainty in a given inhibitor drug’s expected performance. In summary, this result implies that it is optimal to acquire and calibrate to experimental data regarding Casp-act concentrations if we wish to maximize confidence in the expected lethalities of inhibitors with with low $\text {IC}_{\text {50 }}$.

In the case of $\sigma _{IC_{50}}$, as shown in Fig. 5b, the trend of which $\xi$ minimizes the objective remains consistent across threshold values $\mathbb {T}=[0.5, 0.6, 0.7, 0.8, 0.9]$. Thus, when considering the $\sigma _{IC_{50}}$ objective alone, and for any threshold, the optimal $\xi ^{*}$ = mRNA-Bax. We note that reduction in $\sigma _{IC_{50}}$ directly translates to a reduction in the uncertainty regarding which $\text {IC}_{\text {50 }}$achieves the target P(apoptosis) threshold, $\mathbb {T}$. This is analogous to reducing uncertainty in a which inhibitor drug achieves the desired expected performance. For a potent inhibitor with $\mathbb {T} =$ 0.9, selecting the optimal experimental design $\xi ^{*}$ = mRNA-Bax over Casp-act (i.e., the species with the highest $\sigma _{IC_{50}}(0.9)$) results in a substantial 57% reduction in uncertainty regarding the specific $\text {IC}_{\text {50 }}$associated with achieving this cell death probability. In summary, this result implies that it is optimal to acquire and calibrate to experimental data regarding mRNA-Bax concentrations if we wish to maximize model prediction confidence in which inhibitor dosage achieves a given expected lethality.

When combining these two metrics to determine an optimal experimental design (i.e., which protein to measure in the lab to reduce uncertainty in lethality curve predictions) we provide an optimal design $\xi ^{*}$ against the following objectives: (1) $\sigma _{IC_{50}}$ at a threshold of 0.90, (2) $\sigma _{apop}$ at $\text {IC}_{\text {50 }}$= 0.01, and (3) weighted averages of both. In the case of the weighted averages, we take the objective to be defined as: $w_1 \sigma _{IC_{50}} + w_2 \sigma _{apop}$ wherein $w_1 + w_2 = 1$. For the purposes of illustrating the sensitivity of the combined objective to the selection of weights $w_1, w_2$, we show this weighted average objective under $w_1 = [0.01, 0.1]$. These results are tabulated in Table S6. The results of the weighted average show that at $w_1 = 0.01$, Casp-act is optimal, while at $w_1 = 0.1$, mRNA-Bax is preferred. Although the weights used here are arbitrary, domain knowledge can be integrated in the assignment of weights, leading to optimal experiments that align with what is understood regarding the underlying biological pathways.

In each of the objectives considered, activated caspase or Bax mRNA are identified as the optimal experimental measurements that could maximally reduce model prediction uncertainty if used in calibration. Here, we provide some insight as to why this may be the case given the present study. A possible explanation for the switch from $\xi ^{*} =$ Casp-act to $\xi ^{*} =$ mRNA-Bax at low to intermediate $\text {IC}_{\text {50 }}$may be because of the time dynamics. If a higher $\text {IC}_{\text {50 }}$is needed to kill a cancer cell via apoptosis, then measuring a species that represent the precursor steps to triggering apoptosis may more informative than the activated caspase itself. It is also possible that mRNA-Bax performs best across a wider range of drug dosages because it is more proximal to the mechanism of action. Once PARP is inhibited, only a few reactions are needed to trigger mRNA-Bax formation, and ultimately, the activation of caspase.

These differing outcomes across the different experimental design objectives underscore the essential role of biological insights in differentiating among optimal experimental designs.

Conclusions

In this work, we utilized a realistic PD model for PARP1-inhibited cell apoptosis in a model-guided optimal experimental design workflow to reduce uncertainty in the performance of cancer drug candidates. We first showed that, under prior uncertainty, the ability for the model to discriminate between PARP1-inhibitor therapeutic performance disappears. Through extensive simulations and Bayesian inference, we have demonstrated the potential of our methodology to reduce parameter uncertainties and reestablish a discriminatory PD model for drug discovery. Additionally, our exploration of the PARP1 inhibited cell apoptosis model has yielded novel insights into species-specific dependencies with PARP1 inhibitor $\text {IC}_{\text {50 }}$, informing optimal measurement choices. Introducing therapeutic performance metrics based on the half maximal inhibitory concentration ($\text {IC}_{\text {50 }}$), we provide a comprehensive evaluation of drug efficacy under parameter uncertainty. Specifically, we identified optimal experiments under three distinct objectives: (1) $\sigma _{IC_{50}}$ at a threshold of 0.90, (2) $\sigma _{apop}$ at IC50=0.01, and (3) a weighted average of both metrics. At low $\text {IC}_{\text {50 }}$, our results indicate that measuring activated caspases is optimal, with the potential to significantly reduce uncertainty in model-predicted cell death probability. And at high $\text {IC}_{\text {50 }}$, Bax mRNA emerges as the favored contender for experimental measurement. We note that our algorithmic approach, which includes collecting a large number of posterior samples and then rejecting parameter samples that violate biological constraints, leads to a majority of samples being rejected. Therefore, in future work, we will explore alternatives for enforcing biological constraints in Bayesian inference, such as relaxed constraints in the data likelihood. Additionally, it would be beneficial to identify lab experiments that can be conducted to acquire data which is compatible with our model so that we can validate the results of our OED and further constrain the model parameters. It is also important to note that while the present study considered a relatively small set of experimental designs (i.e., 5), this approach easily generalizes to larger design spaces. However, for larger problems, challenges may arise due to the computational resources required in computing posterior probability distributions. For this reason, it will be necessary to consider high performance computing acceleration, as was done in the present study (see section “Implementation”). For very large experimental design systems, further advancements via parallelization of HMC methods or heuristic approaches may be required to reduce computational burden.

In summary, our combined metric-based approach not only elucidates the importance of tailored species measurements to minimize uncertainty in lethality curve predictions but also introduces a flexible framework for optimal experimental design using novel biological pathway models. Thus, our contributions bridge the critical gap between model-guided optimal design and uncertainty quantification, advancing the realm of computational drug discovery.

Data availability

Further information and requests for data and software should be directed to and will be fulfilled by the lead contact, Natalie M. Isenberg (natalie.isenberg@pnnl.gov).

References

Sabe, V. T. et al. Current trends in computer aided drug design and a highlight of drugs discovered via computational techniques: A review. Eur. J. Med. Chem. 224, 113705 (2021).
Article CAS PubMed Google Scholar
Breitling, R., Gilbert, D., Heiner, M. & Orton, R. A structured approach for the engineering of biochemical network models, illustrated for signalling pathways. Brief. Bioinform. 9(5), 404–421 (2008).
Article CAS PubMed Google Scholar
Chen, W. W., Niepel, M. & Sorger, P. K. Classic and contemporary approaches to modeling biochemical reactions. Genes Dev. 24(17), 1861–1875 (2010).
Article CAS PubMed PubMed Central Google Scholar
Frederik Nijhout, H., Best, J. A. & Reed, M. C. Systems biology of robustness and homeostatic mechanisms. Wiley Interdiscip. Rev. Syst. Biol. Med. 11(3), e1440 (2019).
Article PubMed Google Scholar
Braakman, S., Pathmanathan, P. & Moore, H. Evaluation framework for systems models. CPT Pharmacomet. Syst. Pharmacol. 11(3), 264–289 (2022).
Article CAS Google Scholar
Viceconti, M. et al. In silico trials: Verification, validation and uncertainty quantification of predictive models used in the regulatory evaluation of biomedical products. Methods 185, 120–127 (2021).
Article CAS PubMed PubMed Central Google Scholar
Pappalardo, F., Russo, G., Tshinanu, F. M. & Viceconti, M. In silico clinical trials: Concepts and early adoptions. Brief. Bioinform. 20(5), 1699–1708 (2019).
Article CAS PubMed Google Scholar
Mitra, E. D. & Hlavacek, W. S. Parameter estimation and uncertainty quantification for systems biology models. Curr. Opin. Syst. Biol. 18, 9–18 (2019).
Article PubMed PubMed Central Google Scholar
Reenberg, A. T., Ritschel, T. K. S., Dammann, B. & Jørgensen, J. B. High-performance uncertainty quantification in large-scale virtual clinical trials of closed-loop diabetes treatment. In 2022 American Control Conference (ACC), pp. 1367–1372. IEEE (2022).
Stephen, W., Angelikopoulos, P., Beck, J. L. & Koumoutsakos, P. Hierarchical stochastic model in Bayesian inference for engineering applications: Theoretical implications and efficient approximation. ASCE-ASME J. Risk Uncertain. Eng. Syst. Part B Mech. Eng. 5(1), 011006 (2019).
Article Google Scholar
Einbeck, J. et al. A statistical framework for radiation dose estimation with uncertainty quantification from the $\gamma$-h2ax assay. PLoS ONE 13(11), e0207464 (2018).
Article PubMed PubMed Central Google Scholar
Zeng, H. & Gifford, D. K. Quantification of uncertainty in peptide-MHC binding prediction improves high-affinity peptide selection for therapeutic design. Cell Syst. 9(2), 159–166 (2019).
Article CAS PubMed PubMed Central Google Scholar
Raue, A. et al. Structural and practical identifiability analysis of partially observed dynamical models by exploiting the profile likelihood. Bioinformatics 25(15), 1923–1929 (2009).
Article CAS PubMed Google Scholar
Chis, O.-T., Villaverde, A. F., Banga, J. R. & Balsa-Canto, E. On the relationship between sloppiness and identifiability. Math. Biosci. 282, 147–161 (2016).
Article MathSciNet Google Scholar
Raue, A., Becker, V., Klingmüller, U. & Timmer, J. Identifiability and observability analysis for experimental design in nonlinear dynamical models. Chaos Interdiscip. J. Nonlinear Sci. 20(4), 045105 (2010).
Article CAS Google Scholar
Liepe, J., Filippi, S., Komorowski, M. & Stumpf, M. P. H. Maximizing the information content of experiments in systems biology. PLoS Comput. Biol. 9(1), e1002888 (2013).
Article ADS MathSciNet CAS PubMed PubMed Central Google Scholar
Bandara, S., Schlöder, J. P., Eils, R., Bock, H. G. & Meyer, T. Optimal experimental design for parameter estimation of a cell signaling model. PLoS Comput. Biol. 5(11), e1000558 (2009).
Article ADS PubMed PubMed Central Google Scholar
Eriksson, O. et al. Uncertainty quantification, propagation and characterization by Bayesian analysis combined with global sensitivity analysis applied to dynamical intracellular pathway models. Bioinformatics 35(2), 284–292 (2019).
Article CAS PubMed Google Scholar
Faller, D., Klingmüller, U. & Timmer, J. Simulation methods for optimal experimental design in systems biology. Simulation 79(12), 717–725 (2003).
Article Google Scholar
Bustin, S. A. Absolute quantification of mRNA using real-time reverse transcription polymerase chain reaction assays. J. Mol. Endocrinol. 25(2), 169–193 (2000).
Article CAS Google Scholar
Wiśniewski, J. R., Hein, M. Y., Cox, J. & Mann, M. A “proteomic ruler’’ for protein copy number and concentration estimation without spike-in standards. Mol. Cell. Proteom. 13(12), 3497–3506 (2014).
Article Google Scholar
Overstall, A. M., Woods, D. C. & Parker, B. M. Bayesian optimal design for ordinary differential equation models with application in biological science. J. Am. Stat. Assoc. (2019).
Rainforth, T., Foster, A., Ivanova, D. R. & Smith, F. B. Modern Bayesian experimental design. arXiv preprint arXiv:2302.14545 (2023).
Dehghannasiri, R., Yoon, B. J. & Dougherty, E. R. Optimal experimental design for gene regulatory networks in the presence of uncertainty. IEEE/ACM Trans. Comput. Biol. Bioinf. 12(4), 938–950 (2014).
Article Google Scholar
Hong, Y., Kwon, B. & Yoon, B. J. Optimal experimental design for uncertain systems based on coupled differential equations. IEEE Access 9, 53804–53810 (2021).
Article Google Scholar
Yoon, B. J., Qian, X. & Dougherty, E. R. Quantifying the objective cost of uncertainty in complex dynamical systems. IEEE Trans. Signal Process. 61(9), 2256–2266 (2013).
Article ADS Google Scholar
Yoon, B. J., Qian, X. & Dougherty, E. R. Quantifying the multi-objective cost of uncertainty. IEEE Access 9, 80351–80359 (2021).
Article Google Scholar
Lindley, D. V. On a measure of the information provided by an experiment. Ann. Math. Stat. 27(4), 986–1005 (1956).
Article MathSciNet Google Scholar
Huan, X. & Marzouk, Y. M. Simulation-based optimal Bayesian experimental design for nonlinear systems. J. Comput. Phys. 232(1), 288–317 (2013).
Article ADS MathSciNet CAS Google Scholar
Mertins, S. D., Isenberg, N. M., Reyes, K. R., Yoon, B. J., Urban, N., Jogalekar, M. P., Diolaiti, M. E., Weil, M. R. & Stahlberg, E. A. Pharmacodynamic model of parp1 inhibition and global sensitivity analyses can lead to cancer biomarker discovery. bioRxiv (2023).
Bogdał, M. N., Hat, B., Kochańczyk, M. & Lipniacki, T. Levels of pro-apoptotic regulator bad and anti-apoptotic regulator bcl-xl determine the type of the apoptotic logic gate. BMC Syst. Biol. 7, 1–17 (2013).
Article Google Scholar
Duan, L. L., Young, A. L., Nishimura, A. & Dunson, D. B. Bayesian constraint relaxation. Biometrika 107(1), 191–204 (2020).
Article MathSciNet PubMed Google Scholar
Bezanson, J., Karpinski, S., Shah, V. B. & Alan, E. Julia: A fast dynamic language for technical computing. arXiv preprint arXiv:1209.5145 (2012)
Ge, H., Xu, K. & Ghahramani, Z. Turing: A language for flexible probabilistic inference. In International Conference on Artificial Intelligence and Statistics, pp 1682–1690. PMLR (2018).
Rackauckas, C. & Nie, Q. Differentialequations. jl-a performant and feature-rich ecosystem for solving differential equations in julia. J. Open Res. Softw. 5(1), 15–15 (2017).
Article Google Scholar
Faeder, J. R., Blinov, M. L. & Hlavacek, W. S. Rule-based modeling of biochemical systems with bionetgen. Syst. Biol. 113–167 (2009).

Download references

Acknowledgements

The authors would like to acknowledge Manasi P. Jogalekar and Morgan E. Diolaiti for providing experimental data related to the $\text {IC}_{\text {50 }}$for Talazoparib used in this work. In addition, we acknowledge the roles of Naomi Ohashi and Eric A. Stahlberg for their project management roles at the Frederick National Laboratory for Cancer Research and ATOM Research Alliance. The research published here was conducted at Brookhaven National Laboratory.

Funding

This work was supported by the New York Empire State Development Big Data Science Capital Project, the DOE Advanced Scientific Computing Research Applied Mathematics Fellowship, and the Brookhaven Laboratory Directed Research and Development DEDUCE Project (NMI, KRR, BJY, NMU). Additional support was provided by the Energy Earthshots Initiative as part of the DOE Office of Biological and Environmental Research. PNNL is operated by DOE by the Battelle Memorial Institute under Contract DE-A06-76RLO 1830. (NMI). Funding sources include federal funds from the National Cancer Institute, National 793 Institutes of Health, and the Department of Health and Human Services, Leidos Biomedical 794 Research Contract No. 75N91019D00024, Task Order 75N91019F00134 through the 795 Accelerating Therapeutics for Opportunities in Medicine (ATOM) Consortium under 796 CRADA TC02349. (SDM).

Author information

Authors and Affiliations

Pacific Northwest National Laboratory, Richland, WA, 99354, USA
Natalie M. Isenberg
Fredrick National Laboratory for Cancer Research, Fredrick, MD, 21702, USA
Susan D. Mertins
Texas A &M University, College Station, TX, 77843, USA
Byung-Jun Yoon
University at Buffalo, Buffalo, NY, 14260, USA
Kristofer G. Reyes
Brookhaven National Laboratory, Upton, NY, 11973, USA
Byung-Jun Yoon, Kristofer G. Reyes & Nathan M. Urban

Authors

Natalie M. Isenberg
View author publications
You can also search for this author in PubMed Google Scholar
Susan D. Mertins
View author publications
You can also search for this author in PubMed Google Scholar
Byung-Jun Yoon
View author publications
You can also search for this author in PubMed Google Scholar
Kristofer G. Reyes
View author publications
You can also search for this author in PubMed Google Scholar
Nathan M. Urban
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Conceptualization of problem, N.M.I., S.D.M., B.J.Y., K.R., and N.M.U.; Methodology, N.M.I., S.D.M., B.J.Y., K.R., and N.M.U.; Implementation, N.M.I.; Analysis, N.M.I., S.D.M., B.J.Y., K.R., and N.M.U.; Writing—original draft, N.M.I.; Writing—reviewing and editing, N.M.I., S.D.M., B.J.Y., K.R., and N.M.U. Revising—N.M.I.

Corresponding author

Correspondence to Natalie M. Isenberg.

Ethics declarations

Competing interests

SDM is Founder and CEO of BioSystems Strategies, LLC. NMI, BJY, KR, and NMU declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary Information.

Appendix

Simulated experimental data

Simulated experimental data was generated for each of the $N_{\xi }$ observable proteins by taking random parameter samples from their respective prior predictive distributions (see Fig. S1) and modifying these samples via a relative error of $10\%$ across all parameters. The value of $10\%$ was selected as it is the typical standard deviations found in experimental measurements in the laboratory. First, $N_d$ random draws from the parameter prior distributions are made independently for each uncertain parameter, i.e. $\theta ^{sim}_i \sim p(\theta ) \quad \forall i = [1,\ldots ,N_d]$. Next, the $\theta ^{sim}_i$ are used to solve the forward model in Eq. (S1) at an $\text {IC}_{\text {50 }}$=0.005083 $\mu M$, which is the measured $\text {IC}_{\text {50 }}$for WT cells under the PARP1-inhibitor Talazoparib, as shown in Fig. 6. This generates a “true” y.

The noisy $y^{sim}_{k,i} \quad \forall k = [1,\ldots ,N_{\xi }], \forall i = [1,\ldots ,N_d]$ used in inference in this work is generated by applying a multiplicative error $\epsilon$, where we assume that the error is log-normally distributed. Therefore,

$$\begin{aligned}&y^{sim}_{k,i} = y_{k,i} \epsilon ,&\quad \mathop {log}(\epsilon ) \sim \text{Normal}(0, \sigma ^2) \quad \forall k = [1,\ldots ,N_{\xi }], \forall i = [1,\ldots ,N_d] \end{aligned}$$

Where the standard deviation $\sigma$ is 0.1 for all species k. It follows that:

$$\begin{aligned}&\mathop {log}(y^{sim}_{k,i}) = \mathop {log}(y_{k,i}) + \mathop {log}(\epsilon )&\\&y^{sim}_{k,i} \sim \mathop {exp}(\mathop {log}(y_{k,i}) + \mathop {log(\epsilon )}), \end{aligned}$$

$\forall k = [1,\ldots ,N_{\xi }], \forall i = [1,\ldots ,N_d]$. To avoid numerical issues at 0 concentrations (i.e., $y_{k,i} = 0$), we employ a parameter a defined such that $a = b - \mathop {min}\{y_{k,i}\}$, where b is a small tolerance (in this study, $b=0.001$). Leading to the final form of our simulated experimental data as:

$$\begin{aligned}&y^{sim} \sim \mathop {exp}(\mathop {log}(y + a) + log(\epsilon )),&\quad \mathop {log}(\epsilon ) \sim \text{Normal}(0, \sigma ^2) \end{aligned}$$

Algorithmic approach

The workflow diagram depicted in Fig. 7 outlines the systematic procedure employed for conducting Bayesian Optimal Experimental Design (BOED) within the context of the PARP1-inhibited cell apoptosis model.

The process begins by specifying prior uncertainty descriptions for the key model parameters ($p(\theta ) \ \forall \theta \in \Theta$) determined by global sensitivity analysis results presented in³⁰. This set of uncertain parameters is shown in Table S1. Next, synthetic experimental data $(y^{sim}_{k,i})$ are generated for each experimental design (indexed by $k = \{1,\ldots ,N_{\xi }\}$) under consideration, as listed in Table S3, and for $i = \{1,\ldots ,N_d\}$ samples from the prior distributions and a known error model. Prior distribution information can be found in Table S2. We note that we select the experimental designs listed in Table S3 because these species exhibited high prior predictive uncertainty, as seen in Fig. S1. These synthesized data are used to constrain the uncertain model parameters via Bayesian Inference, where the posterior parameter probability distributions ($p(\theta |y_{k,i}^{sim}, \xi _k)$) are estimated using a HMC algorithm (i.e., a type of MCMC algorithm for sampling from an unknown distribution) for each $k = \{1,\ldots ,N_{\xi }\}$ and for each datum $i = \{1,\ldots ,N_d\}$. Next, constraints based on biological feasibility are then applied to reject samples in the Markov chains that violate the expected behavior of the cell under extreme PARP1 inhibitor dosages. This leads to modified posterior estimates ($p'(\theta |y_{k,i}^{sim}, \xi _k)$). These posterior distributions are then utilized to compute the Quantity of Interest (QOI), specifically the probability of cell apoptosis (P(apoptosis|u)), which is assessed for various inhibitor $\text {IC}_{\text {50 }}$values ($u \in \mathcal{U}$). These QOI values serve as the basis for computing the uncertainty metrics: $\sigma _{apop}(u)$ and $\sigma _{IC_{50 k}}(\mathbb {T})$ for each experimental design. Finally, the uncertainty metrics are combined to make optimal experimental decisions, thereby providing recommendations for the most effective experimental designs. This comprehensive workflow guides the process of leveraging BOED to enhance our understanding of how we can optimally reduce uncertainty in the model predictions made by the PARP1-inhibited cell apoptosis model under different drug dosage conditions.

Uncertainty metrics

The calculated lethality serves as the basis for deriving our fundamental experimental design objectives: the $\sigma _{IC_{50}}(\mathbb {T})$ and $\sigma _{apop}(u)$ uncertainty metrics. These metrics offer a quantifiable means of assessing the reduction in uncertainty associated with the various experimental designs, $\xi$.

The $\sigma _{IC_{50}}(\mathbb {T})$ metric is determined by fitting a 4-parameter sigmoidal curve, specifically the form $P(apoptosis|IC_{50})_i = d_i + \frac{a_i-d_i}{1+(\frac{IC{50}}{c_i})^{b_i}}$, to the computed $P(apoptosis|u)_i$ values for each $\text {IC}_{\text {50 }}$value and for each $i \in [1,\ldots ,N_d]$. Subsequently, the intersection points of the sigmoidal fits with pre-specified P(apoptosis) thresholds (e.g., $\mathbb {T}$ = [0.5, 0.6, 0.7, 0.8, 0.9]) determine the associated $\text {IC}_{\text {50 }}$values. The $\sigma {IC_{50}}(\mathbb {T})$ metric is then defined as the standard deviation over $\text {IC}_{\text {50 }}$values at which the sigmoidal fits intersect the threshold.

The $\sigma _{apop}$ metric is computed by taking the standard deviation over all predicted probabilities of apoptosis at a given $\text {IC}_{\text {50 }}$, i.e., $P(apoptosis|u)_i, \quad \forall i = [1,\ldots ,N_d], \forall u \in \mathcal {U}$. This is defined in Eq. (5), where $\bar{P}(apoptosis|u)$ is the arithmetic mean of the predicted probabilities of cell death at the specified inhibitor $\text {IC}_{\text {50 }}$, or u.

$$\sigma _{{apop}} (u): = \sqrt {\frac{1}{{N_{d} - 1}}\sum\limits_{{i = 1}}^{{N_{d} }} {\left( {P\left( {apoptosis\left| u \right.} \right)_{i} - \bar{P}\left( {apoptosis\left| u \right.} \right)} \right)^{2} } } \quad \forall u \in {\mathcal{U}}$$

(5)

Implementation

The entirety of the workflow presented here was implemented using the programming language Julia³³. Specifically, we utilize the probabilistic programming library Turing.jl³⁴ to implement our probabilistic model for Bayesian inference. To sample the posterior chain and obtain posterior probability distributions, we use the No U-Turn Sampler (NUTS), a variant of HMC, with an acceptance rate of 0.65. The ODE model evaluations within the probabilistic model are conducted via the DifferentialEquations.jl library³⁵.

The implementation of BOED we propose here exploits several elements of high-performance computing approaches to accelerate the optimization process. First, the implementation of the ODE model was translated into the low-level programming language, Julia, from the rule-based modeling language, BioNetGen³⁶. When performing ODE model simulations at the nominal parameter settings, the BioNetGen implementation requires 0.10 seconds of CPU time. In contrast, solving the ODE model under the same nominal parameter settings using Julia takes only 0.0018 seconds of CPU time. Consequently, our Julia-based model implementation exhibits an impressive 98% reduction in CPU time compared to the BioNetGen implementation. Although these individual execution times may already appear low, the acceleration of individual model evaluations is critically important in the context of Bayesian optimal experimental design. This is because these model evaluations are used in estimating the data likelihood across various parameter values, and as the number of these evaluations increases, the cumulative effect on computational time of Bayesian inference becomes more pronounced. Secondly, we accelerate the computation of each of the 100 posterior estimates per experimental design by using 100 CPU cores in parallel. This means that we can compute 100 posteriors simultaneously, decreasing the time-to-solution over a sequential approach. Thus, we show that the combined utilization of these high-performance computing strategies enhances the efficiency and practical applicability of our proposed Bayesian optimal experimental design framework within the context of a complex PD model. All instances of posterior sampling through NUTS were solved on the Institutional Cluster at the Scientific Data and Computing Center at Brookhaven National Library. Runs were done in parallel using 3 nodes at a time. Each node used consists of 2 CPUs Intel(R) Xeon(R) CPU E5-2695 v4 @ 2.10 GHz with 18 cores each.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Isenberg, N.M., Mertins, S.D., Yoon, BJ. et al. Identifying Bayesian optimal experiments for uncertain biochemical pathway models. Sci Rep 14, 15237 (2024). https://doi.org/10.1038/s41598-024-65196-w

Download citation

Received: 22 December 2023
Accepted: 18 June 2024
Published: 02 July 2024
DOI: https://doi.org/10.1038/s41598-024-65196-w

Keywords

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Subjects

Abstract

Similar content being viewed by others

CurveCurator: a recalibrated F-statistic to assess, classify, and explore significance of dose–response curves

Using predictive machine learning models for drug response simulation by calibrating patient-specific pathway signatures

Rationalizing general limitations in assessing and comparing methods for compound potency prediction

Introduction

Methods

Mathematical model

Bayesian optimal experimental design

Application

Objective function specification

Incorporating biological constraints

Results and discussion

Prior sensitivity

Posterior estimation

Value of information estimation

Conclusions

Data availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's note

Supplementary Information

Supplementary Information.

Appendix

Appendix

Simulated experimental data

Algorithmic approach

Uncertainty metrics

Implementation

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Comments

Search

Quick links