The purpose of this document is to introduce you how to use the MOSAICbioacc application. This application is based on the R software1 and especially the rjags library (version 4.10)2 andthe jagsUI library (version 1.5.1)3, to estimate parameters of Toxico-Kinetic (TK) models under a Bayesian framework. MOSAICbioacc is developed as an R-Shiny interface (version 1.6.0)4. If you want to be kept informed, please email us: sandrine.charles@univ-lyon1.fr.
The MOSAICbioacc application is a turn-key web tool providing bioaccumulation metrics (BCF/BMF/BSAF) from a TK model fitted to accumulation and depuration data. It is designed to fulfill the requirements of regulators when examining applications for market authorization of active substances.
Toxico-Kinetic/Toxico-Dynamic (TKTD) models are used to describe and predict the toxicity and the effects of chemical substances on individual traits based on experimental data. The TK part describes the relationship between the exposure medium and the organism, considering various processes such as ADME (accumulation, depuration, metabolization and excretion)5. Regulation No 283/2013 (EU)6 defines the data requirements for active substances of plant protection products in marketing authorization applications. In particular, a bioaccumulation study on fish is required following OECD guideline 3057. Achieved in agreement with EFSA’s scientific opinion on good modeling practices8,9, this ready-to-use on-line service allows to easily estimate BCF/BMF/BSAF as required in a regulatory framework, accounting for bioaccumulation of parent compounds and their metabolites through biotransformation. MOSAICbioacc does not expect any input besides the accumulation-depuration data sets according to exposure concentrations. The service automatically fits the TK model, initially defined from the appropriately user data and optimizes the estimation of its parameters. Then, the service provides the corresponding bioaccumulation metrics, as well as all goodness-of-fit criteria required to carefully check the reliability of the results10. All calculations are based on the JAGS software and its companion R packages rjags2,11 and jagsUI3.
If you use the web interface (https://mosaic.univ-lyon1.fr/bioacc), you don’t need to install anything.
However, if you want to run the R script (downloadable from the application) by yourself, you need to install:
rjags package2. You can install it directly from the R software > Tools > Install Packages > rjags or from the CRAN website http://cran.r-project.org/web/packages/rjags/index.html.jagsUI package3. You can install it directly from the R software > Tools > Install Packages > jagsUI or from the CRAN website https://cran.r-project.org/web/packages/jagsUI/index.html.tidyverse, gridExtra, ggmcmc, GGally, ggmcmc, stringr and DT.Here is an example of the R code to install the required packages:
if(is.element('rjags', installed.packages()[,1]) == FALSE)
{install.packages('rjags')}
if(is.element('jagsUI', installed.packages()[,1]) == FALSE)
{install.packages('jagsUI')}
if(is.element('tidyverse', installed.packages()[,1]) == FALSE)
{install.packages('tidyverse')}
if(is.element('gridExtra', installed.packages()[,1]) == FALSE)
{install.packages('gridExtra')}
if(is.element('ggmcmc', installed.packages()[,1]) == FALSE)
{install.packages('ggmcmc')}
if(is.element('GGally', installed.packages()[,1]) == FALSE)
{install.packages('GGally')}
if(is.element('stringr', installed.packages()[,1]) == FALSE)
{install.packages('stringr')}
if(is.element('DT', installed.packages()[,1]) == FALSE)
{install.packages('DT')}
When using MOSAICbioacc, the first step is to upload input data (Fig. 1.1):
Figure 1.1: Data uploading and user information to enter.
You can upload your own data (click on “Browse”) by taking care about the format specification of your file. MOSAICbioacc expects to receive data as a .txt file or a .csv file (comma, semicolon or tabular separator). Each line of the table corresponds to a time point for a given replicate and a given exposure concentration of the contaminant. The table must contain the four following columns, with exact header names (Table 1.1):
According to your data, further columns can be added in the file:
Then be careful to the units:
| time | conc | expw | replicate |
|---|---|---|---|
| 0 | 0.000 | 0.0044 | 2 |
| 3 | 0.225 | 0.0044 | 2 |
| 7 | 0.355 | 0.0044 | 2 |
| 14 | 0.553 | 0.0044 | 2 |
| 21 | 0.658 | 0.0044 | 2 |
| 28 | 0.785 | 0.0044 | 2 |
Once the upload is complete, you have to manually select the corresponding separator, the appropriate time unit and the duration of the accumulation phase.
Instead of using your own data, you can try MOSAICbioacc from several example files that are provided. Three data sets use a simple TK model (one exposure route and one elimination process), whereas two data sets consider a more complex TK model, accounting for metabolization and growth (Fig. 1.2).
Figure 1.2: Example files available in MOSAICbioacc.
Please note that more data are, the more the TK model is complex, and the more the calculations take time. Thus, we indicated the approximative mean time calculation for each example data sets (Fig.1.2).
In case you upload a data set with several exposure concentrations, select the one for which you want to see the results. We propose two types of visualization to check if the file has correctly been uploaded: as a table or a plot (Fig. 1.3 and 1.4).
Figure 1.3: Table of the uploaded data at the selected exposure concentration.
Figure 1.4: Plot of the uploaded data at the selected exposure concentration.
According to your data, you can visualize the plot for the parent compound, the metabolite(s) and/or for growth (Fig. 1.5).
Figure 1.5: Plot of the uploaded data (metabolites) at the selected exposure concentration.
Don’t forget to select the appropriate duration of the accumulation phase and the growth unit (if required) before to continue (Fig. 1.3 and 1.4, left side).
The most complete model and its corresponding parameters are automatically selected according to the experimental design as given within the uploaded data set (Fig. 2.1). Users can also deselect some of the parameters (based on biological hypotheses related to the most probable exposure route or by neglecting one elimination process, for example). These choices lead to the automatic building of a nested TK sub-model to fit again on the data in clicking on the ‘Refresh’ button. The equations of the most complete model are provided to the users on-line (Fig. 2.2).
Figure 2.1: Example of a model choice.
Figure 2.2: Example of equations of the model corresponding to Fig. 2.1.
For the accumulation phase:
For the depuration phase:
Also, \(\sigma\) are the expected standard deviations of the measured contaminant concentration or growth of the organisms:
A mathematical model is composed of two parts: the deterministic and the stochastic parts. In the case of TK models fitted to concentration measurements, it can be written as follows (Eq. (1)): \[
y = f(x, \theta) + \varepsilon \quad (1)
\] with \(f\) the function describing the mean relationship between \(x\) and \(y\), \(y\) the observed variable, \(x\) the controlled variable, \(\theta\) the parameter vector to estimate and \(\varepsilon\) the random variable describing the variability of the data around the mean tendency.
The deterministic part
The organisms are here considered as single compartments for which a generic first-order kinetic bioaccumulation model can be expressed as follows12 (Eqs. (2) and (3) for the accumulation phase and Eqs. (4) and (5) for the elimination phase):
\(\left\{\begin{array}{l}\frac{d C_p(t)}{d t} = U - (E + M) C_p(t) \quad (2) \\ \frac{d C_{m_\ell}(t)}{d t} = k_{m_\ell} C_p(t) - k_{e_\ell} C_{m_\ell}(t) \quad \forall \ell=1 \ldots M \quad (3) \end{array}\right. \quad \text { for } 0 \leqslant t \leqslant t_{c}\)
\(\left\{\begin{array}{l}\frac{d C_p(t)}{d t} = - (E + M) C_p(t) \quad (4)\\ \frac{d C_{m_\ell}(t)}{d t} = k_{m_\ell} C_p(t) - k_{e_\ell} C_{m_\ell}(t) \quad \forall \ell=1\ldots M \quad (5)\end{array}\right.\quad\; \ \ \ \ \ \text { for } t>t_{c}\)
with:
| Symbol | Meaning |
|---|---|
| \(I\) | total number of exposure sources |
| \(J\) | total number of elimination processes |
| \(L\) | total number of metabolites |
| \(i\) | index of exposure sources, \(i = 1 \dots I\) |
| \(j\) | index of elimination processes, \(j = 1 \dots J\) |
| \(\ell\) | index of metabolites, \(\ell = 1 \dots L\) |
| \(t\) | time (expressed in time units) |
| \(c_i\) | exposure concentration of route \(i\) (in \(\mu g.m L^{-1}\)) |
| \(C_p(t)\) | internal concentration of the parent compound at time \(t\) (in \(\mu g.g^{-1}\)) |
| \(C_{m_\ell}(t)\) | internal concentration of metabolite \(\ell\) (in \(\mu g.g^{-1}\)) |
| \(k_{u_i}\) | uptake rate of exposure source \(i\) (expressed per time units) |
| \(k_{e_j}\) | elimination rates of elimination process \(j\) (expressed per time units) |
| \(k_{e_{m\ell}}\) | elimination rates of metabolite \(\ell\) (expressed per time units) |
| \(k_{m_\ell}\) | metabolization rate of metabolite \(\ell\) (expressed per time units) |
| \(t_c\) | duration of the accumulation phase |
| \(U = \sum_{i=1}^{I} k_{u_i} c_i\) | sum of all uptake terms |
| \(E = \sum_{j=1}^{J} k_{e_j}\) | sum of all elimination terms for the parent compound |
| \(M = \sum_{\ell=1}^{L} k_{m_\ell}\) | sum of all elimination terms for metabolite \(\ell\) |
The simplest model in MOSAICbioacc application considers only one exposure route (for example by water, parameter \(k_{u_w}\)) with the corresponding elimination rate (excretion, parameter \(k_{e_e}\)), as given by Eq. (6):
\[
\left \{\begin{array}{l}
\frac{dC_p(t)}{dt} = k_{u_w} c_{w} - k_{e_e} C_p(t) \quad \text { for } 0 \leqslant t \leqslant t_{c} \\
\frac{dC_p(t)}{dt} = - k_{e_e} C_p(t) \quad\quad\quad\quad\quad\: \: \: \text { for } t>t_{c}
\end{array}\right. \quad (6)
\]
where \(k_{u_w}\) is the uptake rate from water (\(time^{-1}\)), \(c_w\) is the mean contaminant concentration in water (\(\mu g.mL^{-1}\)) and \(k_{e_e}\) is the rate related to the excretion process (\(time^{-1}\)).
The more complex model in MOSAICbioacc application considers a maximum of four exposure routes (water: \(k_{u_w}\), pore water: \(k_{u_{pw}}\), sediment: \(k_{u_s}\) and food: \(k_{u_f}\)) and a maximum of three elimination processes (excretion, \(k_{e_e}\), biotransformation: \(k_{m\ell}\), growth dilution: \(k_{e_g}\)), as given by Eq. (7) for parent compound, Eq. (8) for metabolite \(\ell\) and Eq. (9) for growth (Von Bertalanffy equations, classically used for fishes):
\(\left \{\begin{array}{l} \frac{dC_p(t)}{dt} = k_{u_w} c_{w} + k_{u_{pw}} c_{pw} + k_{u_s} c_{s} + k_{u_f} c_{f} - (k_{e_e} + k_{e_g} + k_{m_\ell}) C_p(t) \quad \text { for } 0 \leqslant t \leqslant t_{c} \\ \frac{dC_p(t)}{dt} = - (k_{e_e} + k_{e_g} + k_{m_\ell}) C_p(t) \quad\quad\quad\quad\quad\: \: \: \text { for } t>t_{c}\end{array}\right. \:(7)\)
\(\frac{dC_{m_\ell}(t)}{dt} = k_{m_\ell} C_p(t) - k_{e_{m\ell}} C_{m_\ell}(t) \quad \quad (8)\)
\(\frac{dG(t)}{dt} = k_{e_g} (g_{max}- G(t)) \quad (9)\)
with \(G(t)\) the measured growth of the organism at time \(t\) (in growth unit) and \(g_{max}\) the asymptotic growth \(\ell\) (in \(\mu g.g^{-1}\)).
More details and especially the exact solutions of these equations are given here.
The stochastic part
We assumed a Gaussian (normal) probability distribution of the concentration measurements within the organism as follows (Eqs. (10) to (12)):
\(\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad C_{obs_p}(t) \sim \mathcal{N}(C_p(t),\sigma_p^2) \quad (10)\)
\(\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad C_{obs_{m_\ell}}(t) \sim \mathcal{N}(C_{m_\ell}(t),\sigma_{m_\ell}^2) \quad (11)\)
\(\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad G_{obs}(t) \sim \mathcal{N}(G(t),\sigma_g^2) \quad (12)\)
where \(\mathcal{N}\) stands for the normal law, \(C_{obs_p}(t)\) and \(C_{obs_{m\ell}}(t)\) correspond to the measured parent and metabolite \(\ell\) concentrations within the organism measured at time \(t\), \(C_p(t)\) is the parent compound concentration at time \(t\) predicted by the model based on to Eqs. (2) and (4), \(C_{m_\ell}(t)\) is the concentration of metabolite \(\ell\) at time \(t\) predicted by the model based on Eqs. (3) and (5), \(G_{obs}(t)\) is the measured growth of the organism at time \(t\), \(G(t)\) is the growth at time \(t\) predicted by the model based on Eq. (9), \(\sigma\) are the expected standard deviations of the measured contaminant concentration or growth of the organisms (\(\mu g.g^{-1}\) or growth unit).
A Directed Acyclic Graph (DAG) is given on Fig. 2.3, which symbolize the deterministic links between parameters and variables for the generic TK model (Eqs.(2) to (5)) and the stochastic links between the observed and predicted data (Eqs. (11) to (13)).
Figure 2.3: Directed acyclic graph of the generic TK model. Observed variables, such as the contaminant concentration in organisms and medium \(i\) and growth data, are represented by rectangle nodes. Model parameters and variables are represented by circular nodes. Dotted arrows represent deterministic links (Eqs. (2) to (5)), while solid arrows represent stochastic links between predicted and observed data (Eqs. (10) to (12)).
In MOSAICbioacc, prior choice is hidden to the user. However, we give here some information to help the user to understand the model behind. Before conducting an experimental study, prior distributions are defined for each parameter according to information available from the literature, expert knowledge and/or previous experiments. Depending on the sources where the information come from, informative, semi-informative or non-informative prior distributions can be used. If a parameter was already estimated in previous studies or if previous data are available, a prior distribution can easily be characterized with an appropriate probability distribution. However, if no information is available but an order of magnitude is (positive only, for example), it is possible to use a weakly informative prior. If any information is available on the order of magnitude of a parameter, its prior is preferably defined on a decimal logarithm scale in order to consider with equal probability both low and high expected values.
As MOSAICbioacc application has to be the most generic as possible, priors were assumed to be non-informative log10-uniform distributions within [-5, 5] for all uptake and elimination rate constants. For growth, a uniform prior distribution [\(10 \times mean(G)/4\), \(10 \times mean(G)\)] was assumed for parameter \(g_{max}\) and a uniform prior distribution [\(0\), \((10 \times mean(G))/(8-10^{-10})\)] for parameter \(g_{0}\). We assumed a non-informative [\(0\),\(A\)] uniform prior for all \(\sigma\) with a large A, here defined as five times the maximum internal measured concentration, which is then removed from the data set, as usually proceeded13.
The Bayesian approach considers that data are fixed but that the parameters are unknown random variables following a probabilistic distribution. This leads to the following practical implications: (i) the Bayesian process optimises the probability of parameter vector \(\theta\) given the data set \(\mathrm{Y}\) used for calibration (the so-called posterior distribution) not only the likelihood (see below); (ii) there is a need to provide reasonable prior information, then updating this information by accounting for the data. Below is a short introduction to Bayesian principles14.
In short, the Bayesian approach requires the following steps:
• Choose the prior distributions on parameters based on previous results, literature or expert knowledge (without looking at the data to fit): \(\mathrm{P}(\theta)\);
• Define the probabilistic model from the data, that is the random variable whose data would be one realisation assuming known values of parameters, namely the likelihood: \(\mathrm{P(Y} \, \mid\, \theta)\);
• Calculate the joint posterior distribution of the parameters given the data via the Bayes formula: \(\mathrm{P}(\theta \, \mid \, \mathrm{Y})\);
• Provide statistical summaries of parameter estimates (namely, appropriate quantiles);
• Get any function of the parameter estimates as posterior probability distribution, like for example BCF calculations or predictions of new observations.
Basic principles
The keystone of the Bayesian approach is the Bayes formula (Eq. (13): \[
\mathrm{P}(\theta \mid \mathrm{Y})=\frac{\mathrm{P}(\theta) \mathrm{P}(\mathrm{Y} \mid \theta)}{\mathrm{P}(\mathrm{Y})} \quad (13)
\] where Y are the observed data; \(\mathrm{P}(\theta \mid \mathrm{Y})\) is the joint posterior distribution of parameter vector ; \(\mathrm{P}(\mathrm{Y} \mid \theta)\) is the likelihood of the data given the parameters; \(\mathrm{P}(\theta)\) is the joint prior distribution of parameter vector \(\theta\). Given that \(\mathrm{P(Y)}\) is known and fixed, it is often not considered as it does not depend on and will not influence the posterior distribution. Hence (Eq. (14)): \[
\mathrm{P}(\theta \mid \mathrm{Y}) \simeq \mathrm{P}(\theta) \mathrm{P}(\mathrm{Y} \mid \theta) \quad (14)
\]
with \(\mathrm{P}(\theta)\mathrm{P}(Y|\theta)\) the unormalised posterior density leading to (Eq. (15)):
\[
\mathrm{P(Y)}=\int \mathrm{P}(\theta) \mathrm{P(Y} \mid \theta) d \theta \quad (15)
\] The prior distribution \(\mathrm{P}(\theta)\) expresses the available parameter information without knowing the observed data, while the posterior distribution \(\mathrm{P}(\theta \, \mid\, \mathrm{Y})\) combines this prior information (which may be more or less informative depending on what is known about the value of the parameters beforehand) with evidence from the data (expressed through the likelihood) into a joint posterior density probability distribution for the parameters. The overall expectation is to get a narrower posterior distribution compared to the prior one after the computing of the Bayesian algorithm: the difference between the two distributions reflects the information provided by the data. When the non-informative prior is vague (translated for example into a flat uniform distribution), and the data sufficiently informative, the results are similar than those obtained under a frequentist approach.
Joint posterior distribution
The joint posterior distribution has the dimension of the number of parameters times the number of iterations within the Monte Carlo Markov Chain (MCMC) chains. It can be plotted in planes of parameter pairs to visualise correlations between parameters. In an example case with two binormally distributed parameters, the joint posterior distribution can be plotted in the 2D-parameter space as illustrated by ellipses on Fig. 2.4; in this example, parameters \(\theta_1\) and \(\theta_2\) appear slightly positively correlated. From the joint posterior distribution, the marginal posterior distributions for each parameter (as illustrated by grey normal distributions on bottom and left sides of Fig. 2.4) can be extracted. Then, from the marginal posterior distributions, some statistical summaries on parameter estimates can be extracted, usually the median (illustrated by vertical and horizontal plain grey lines on Fig. 2.4) as well as 2.5% and 97.5% quantiles to serve as 95% credible intervals (illustrated by vertical and horizontal dotted grey lines on Fig. 2.4). Another advantage of having the joint posterior distribution is that the posterior distribution of any function of the parameters can also be obtained.
Figure 2.4: Theoretical binormal joint posterior distribution of parameter vector (\(\theta_1\), \(\theta_2\)). Ellipses correspond to isoclines of the joint posterior distribution; grey distributions are marginal posterior distributions of both parameters; solid horizontal and vertical lines correspond to the medians of these marginal distributions; dashed horizontal and vertical lines correspond to the 2.5% and 97.5% quantiles of the marginal distributions.
Parameter uncertainties
One implication of adopting a Bayesian approach is that the uncertainty on a parameter can easily be expressed as a probability distribution from which a credible interval (also called a Bayesian confidence interval) can be extracted. For example, the 95% credible interval delimits a range of values where the parameters lies with a 95% probability.
Numerical computation
Many numerical methods have been developed to approximately compute the joint posterior distribution, mainly based on simulations by Monte Carlo Markov Chain (MCMC) sampling methods used to generate random numbers from complex joint distributions. MCMC algorithms are a general method based on drawing values of parameter vector \(\theta\) from approximate distributions and then correcting those draws to better approximate the target posterior distribution \(\mathrm{P}(\theta \, \mid\, \mathrm{Y})\). The sampling is done sequentially, with the distribution of the sampled draws depending only on the last value drawn; hence, the draws form a Markov Chain. The key to the method success, however, is not the Markov property but rather the fact that the approximate distributions are improved at each step of the simulation, in the sense that it finally converges to the target posterior distribution after an enough number of iterations. Indeed, with MCMC algorithms, the simulation process must run long enough so that the distribution of the current draws is close enough to the desired target posterior distribution.
MCMC algorithms use random walk algorithms, among which the Metropolis algorithm (and its generalisation, the Metropolis–Hasting algorithm) as an adaptation of a random walk with an acceptance/rejection rule to converge to the specified target distribution15,16. The Gibbs sampler is a special case of the Metropolis–Hastings algorithm applicable when the joint distribution is not known explicitly, or when it is difficult to directly sample from, while the conditional distribution of each parameter is known and it is easy (or at least, easier) to sample from17.
Several tools are available to automatically perform these computations. In MOSAICbioacc, JAGS11 (version 4.3.0. (2017-08-10)) and R software1 (version 4.0.2 (2020-06-22)) are used. The models are fitted to bioaccumulation data using Bayesian inference via Monte Carlo Markov Chain (MCMC) sampling based on a Gibbs-type algorithm. For each model, we start by running a short sampling phase with three chains (5,000 iterations after a burn-in phase of 10,000 iterations) then using the Raftery and Lewis18 method to set the necessary thinning and number of iterations to reach an accurate estimation of the joint posterior distribution.
Once data uploaded and the model stated, you can click on button “Calculate and Display.” We recommend you to pay attention to the selected exposure concentration before to proceed to calculations. Besides, calculations can take several minutes for the most complicated models.
You will find at the end of this user guide an appendix (section 5) which gathers together other types of results which can be obtained with other data sets and how to interpret them.
To illustrate the result section, here we used the example file ‘Pimephales_two.csv,’ where Pimephales promelas are exposed to a highly hydrophobic substance (logKow = 9) spiked water at \(0.0044\) \(\mu g.mL^{-1}\) and where the duration of the accumulation phase is equal to 49 days.
As a first result, we provide the bioaccumulation metrics (BCF, BCFpw, BSAF and BMF). Please, note that this section is reactive according to your data. If required, change tab to get the other bioaccumulation metrics (Fig. 3.1).
Figure 3.1: Example of reactive tabs for bioaccumulation metric section.
As a first result, we provide the kinetic bioconcentration factor (\(BCF_{k}\)). The BCF at steady-state (\(BCF_{ss}\)) can be asked if you consider a steady-state is almost reached at the end of the accumulation phase. These factors are mathematically given by Equations (16) and (17) respectively:
\[ BCF_{k} =\frac{k_{u_w}}{\sum_{j=1}^{J} k_{e_j} + \sum_{\ell=1}^{L} k_{m_\ell}} \quad (16) \]
\[ BCF_{ss} =\frac{C_p(t_{c})}{c_{w}} \quad (17) \]
where \(C_p(t_c)\) is the contaminant concentration within the organism at steady-state that is at the end of the accumulation phase (\(\mu g. g^{-1}\)) and \(c_w\) is the contaminant concentration in water (\(\mu g . mL^{-1}\)).
BCF are given as probability distributions (Fig. 3.2) and summarized with their median and their 95% uncertainty limits (95% credible intervals, Table 3.1).
Figure 3.2: Probability density of \(BCF_k\) and \(BCF_{ss}\). The middle dotted line represents the median value, and the dotted lines on left and right sides are the 2.5 and 97.5% quantiles. These results were obtained with the example file Pimephales_two.csv.
| 2.5% | 50% | 97.5% | |
|---|---|---|---|
| \(BCF_k\) | 232 | 276 | 339 |
| \(BCF_{ss}\) | 147 | 232 | 315 |
In the above example, the steady-state is almost reached at time \(t_c\) (Fig. 3.3), thus it is reasonable to ask for the \(BCF_{ss}\).
Note that the OECD guideline 3057 reports that “greater emphasis must be on kinetic BCF estimate (when possible) next to estimating the BCF at steady-state.” Thus we recommend to preferentially consider the \(BCF_k\) rather than the \(BCF_{ss}\).
When appropriate, we provide the kinetic pore water bioconcentration factor (\(BCF_{pw_{k}}\)).Again, the corresponding BCF at steady-state (\(BCF_{pw_{ss}}\)) can be asked. These factors are mathematically given by Equations (18) and (19) respectively:
\[ BCF_{pw_{k}} =\frac{k_{u_{pw}}}{\sum_{j=1}^{J} k_{e_j} + \sum_{\ell=1}^{L} k_{m_\ell}} \quad (18) \]
\[ BCF_{pw_{ss}} =\frac{C_p(t_{c})}{c_{pw}} \quad (19) \]
where \(C_p(t_c)\) is the contaminant concentration within the organism at time \(t_c\) (\(\mu g. g^{-1}\)) and \(c_{pw}\) is the contaminant concentration in pore water (\(\mu g . mL^{-1}\)). BCFpw are given as probability distributions and summarized with their median and their 95% uncertainty limits (95% credible intervals).
When appropriate, we provide the kinetic biota-sediment factor (\(BSAF_{k}\)) and the BSAF at steady-state (\(BSAF_{ss}\)) can be asked. These factors are mathematically given by Equations (20) and (21) respectively:
\[ BSAF_{k} =\frac{k_{u_s}}{\sum_{j=1}^{J} k_{e_j} + \sum_{\ell=1}^{L} k_{m_\ell}} \quad (20) \]
\[ BSAF_{ss} =\frac{C_p(t_{c})}{c_{s}} \quad (21) \]
where \(C_p(t_c)\) is the contaminant concentration within the organism at time \(t_c\) (\(\mu g. g^{-1}\)) and \(c_s\) is the contaminant concentration in sediment (\(\mu g . g^{-1}\)). BSAF are given as probability distributions and summarized with their median and their 95% uncertainty limits (95% credible intervals).
When appropriate, we provide the kinetic biomagnification factor (\(BMF_{k}\)) and the BMF at steady-state (\(BSAF_{ss}\)) can be asked. These factors are mathematically given by Equations (22) and (23) respectively:
\[ BMF_{k} =\frac{k_{u_f}}{\sum_{j=1}^{J} k_{e_j} + \sum_{\ell=1}^{L} k_{m_\ell}} \quad (22) \]
\[ BMF_{ss} =\frac{C_p(t_{c})}{c_{f}} \quad (23) \]
where \(C_p(t_c)\) is the contaminant concentration within the organism at time \(t_c\) (\(\mu g. g^{-1}\)) and \(c_f\) is the contaminant concentration in food (\(\mu g . g^{-1}\)). BMF are given as probability distributions and summarized with their median and their 95% uncertainty limits (95% credible intervals).
A coefficient of variation (\(CV\)) for each bioaccumulation metric can be calculated as follows (Eq. (24)):
\[ CV =\frac{Q_{k}97.5-Q_{k}2.5}{4 \times Q_{k}50} \quad (24) \]
with \(Q_{k}2.5\), \(Q_{k}50\) and \(Q_{k}97.5\) the 2.5%, 50% and 97.5% quantiles of the kinetic bioaccumulation metric. Based on our experiment, the coefficient of variation is expected to not exceed \(0.5\).
If the bioaccumulation metric at steady-state is asked, the corresponding \(CV\) is given (Eq. (25)):
\[ CV =\frac{Q_{ss}97.5-Q_{ss}2.5}{4 \times Q_{ss}50} \quad (25) \]
with \(Q_{ss}2.5\), \(Q_{ss}50\) and \(Q_{ss}97.5\) the 2.5%, 50% and 97.5% quantiles of the steady-state bioaccumulation metric.
We first provide the fitted curve superimposed to the observations (Fig. 3.3, black dots): the orange plain line is the median curve, the gray zone is the uncertainty band delimited by 2.5% and 97.5% quantiles of predictions in orange dotted lines. This section is reactive according to your data: if there is biotransformation or growth, the fitted curve superimposed to the observations for parent compound, metabolite(s) and/or growth data are given (for example from the example file “Male_Gammarus_seanine.csv,” Fig. 3.4).
Figure 3.3: Measured (black dots) and predicted contaminant concentrations in the organism (µg.g\(^{-1}\)). Median predictions are symbolized by the orange plain line and the uncertainty bands by the gray zone which is delimited by the 2.5% and 97.5% quantiles in orange dotted lines. The black dotted vertical line indicates the separation between the accumulation phase and the depuration phase. These results were obtained with the example file Pimephales_two.csv.
Figure 3.4: Measured (black dots) and predicted contaminant concentrations in the organism (µg.g\(^{-1}\)). Median predictions are symbolized by the orange plain line and the uncertainty bands by the gray zone which is delimited by the 2.5% and 97.5% quantiles in orange dotted lines. The black dotted vertical line indicates the separation between the accumulation phase and the depuration phase. These results were obtained with the example file Male_Gammarus_seanine.csv.
From the joint posterior distribution, we can obtain the marginal posterior distributions for each parameter, which can be summarized by their medians and their 95% credible intervals (Table 3.2).
| 2.5% | 50% | 97.5% | ||
|---|---|---|---|---|
| \(k_{u_w}\) | 7.307 | 10.55 | 15.34 | \(d^{-1}\) |
| \(k_{e_e}\) | 0.02288 | 0.03837 | 0.06033 | \(d^{-1}\) |
| \(\sigma_p\) | 0.1243 | 0.1685 | 0.2464 | \(\mu g.g^{-1}\) |
Goodness-of-fit criteria are given below in our prioritised order; the Posterior Predictive Check (PPC) and the prior-posterior comparison are the most important to check; if they do not correspond to the expectation, you must consider your results with an even more particular attention. As an indication, if at least two criteria are fulfilled, the results obtained can be considered as good enough. We suggest that you refer to the appendix at the end of the document for more information about cases not in accordance with the classical expectations (section 5).
The PPC shows the observed values against their corresponding estimated predictions (black dots), along with their 95% credible interval (vertical segments). If the fit is correct, we expect to see 95% of the data within the intervals. Ideally, observations and predictions should coincide, so we would expect to see black dots along the first bisector \(y = x\) (plain black line). The 95% credible intervals are colored in green if they overlap this line, in red otherwise. In the following example (Fig. 3.5), 95.24% of the measured concentrations (\(n=20/21\)) are in the 95% credible intervals of their predictions.
Figure 3.5: Example of a PPC plot: predicted against observed concentrations (black dots) and predicted 95% credible intervals (vertical green and red segments). These results were obtained with the example file Pimephales_two.csv.
An example of prior and posterior distributions is illustrated in Fig. 3.6. The prior distribution is represented by the gray area and the posterior distribution by the orange area. The precision of the model parameter estimation can be visualized by comparing prior and posterior distributions: the overall expectation is to get a narrower posterior distribution compared to the prior one, what reflects that data contributed enough to precisely estimate parameters. In the example of Fig. 3.6, marginal posterior distributions for \(k_{u_w}\) and \(k_{e_e}\) are narrower (orange area) than their respective prior distributions (grey area). Within the application, you can chose to visualize either the deterministic or the stochastic parameters by selecting the corresponding tab.