# Fitting two models in sequence with parameters in common

• I
I wonder if anyone can please help / point me to some info on how to solve this problem. I posted the same question on another website, and so far there is no conclusive answer.

I have some pharmacokinetic data for a molecule that was administered in rat, first IV (intra-venously), then PO (orally). The data are in both cases of the form (time,concentration).

I fit (in R) a first model using the IV data, obtaining estimates for parameters 'CL' and 'V'. Then I fit a second model using the PO data, obtaining estimates for parameters 'ka' and 'F'.

The crucial aspect that is causing all the trouble is that the second model uses the best estimate of CL and V obtained from the first model, but in my implementation the variances of CL and V, calculated when I run the first model, do not enter in any way in the second model, which sounds wrong to me.
Also, the fitting package I am using (FME) calculates the covariance matrix, which I need for other calculations, but of course as this matrix is made only for the modifiable parameters of each model, the one obtained from the second model does not have CL and V in it. But I need to know the covariance between the IV parameters and PO parameters.

I asked a colleague at work, and she suggested to run a single model with all 4 parameters (CL, V, ka, F), where the residuals are calculated using both the IV and PO data.
This is possible in FME, by using two distinct dependent variables, and probably a good idea in general, except that I specifically need: A) CL and V to be regressed only using the IV data, and B) ka and F only using the PO data. "B" happens automatically, because the function that predicts the IV data does not contain either ka or F, so when ka and F vary, the residuals on the IV data are constant, thus only the ones on the IV data are affected. However I think "A" is not possible in this case, because as I explained the function that predicts the PO data uses all 4 parameters, so if CL and V vary, all residuals will vary, both the ones on the IV data and the ones on the PO data. So ultimately the PO data would have an impact on the value of CL and V, which is what I must avoid, for reasons that are long to explain but make sense.

Can you please suggest a procedure by which I can achieve what I described? I.e. find the best estimate and variances of parameters CL and V using IV data alone, then of parameters ka and F using PO data alone, and then a covariance matrix for the full set of all 4 parameters.

See below the current method and some sample data in R code format. [You will notice that I used transformations for the 4 parameters, in some cases logarithmic, in some cases by a trigonometric function, in order to limit their ranges to reasonable values. This can be in principle be done by FME using upper and lower bounds, but I found the fitting is more stable if I do it myself].

IV data fitting:

based on function: $ln ( C_{IV} ) = ln ( dose_{iv} ) - ln ( V) - \frac {CL} {V} \cdot t$

Code:
if (length(find.package(package="FME",quiet=TRUE))==0) install.packages("FME")
require(FME)
dfiv <- data.frame(t1=c(0.05,0.25,0.5,1,3,6),c1=c(250,150,125,77,9.5,0.5))
dfiv_log <- data.frame(t1=dfiv$t1,lc1=log(dfiv$c1))
dose_iv <- 1000
log_C_iv <- function(p,t1) {CL=exp(p[]); V=exp(p[]); log(dose_iv)-log(V)-t1*CL/V}
IV_cost <- function(p) { mo = data.frame(t1=dfiv_log$t1,lc1=log_C_iv(p,dfiv_log$t1)); modCost(model=mo, obs = dfiv_log, x="t1") }
p0iv <- c(pCL=1.57,pV=1.17)
mivlog_FME <- modFit(f=IV_cost,p=p0iv)
mivlog_FME_s <- summary(mivlog_FME)
ps_iv <- mivlog_FME_s$par[,"Estimate"] mivlog_FME_s CL <- exp(ps_iv[["pCL"]]) V <- exp(ps_iv[["pV"]]) plot(dfiv_log) lines(dfiv_log$t1,log_C_iv(ps_iv,dfiv_log$t1)) PO data fitting: based on function: $C_{PO} = F \cdot \frac {dose_{po}} {V} \cdot \frac {ka} {ka - \frac {CL} {V} } \cdot ( e^{- \frac {CL} {V} \cdot t } - e^{-ka \cdot t})$ Code: dfpo <- data.frame(t1=c(0,0.25,0.5,1,3,6,24),c1=c(0,130,100,65,30,2,0)) dose_po <- 5000 C_po <- function(p,t1) {ka=exp(p[]); F=0.5+atan(p[])/pi; F*dose_po/V*ka/(ka-CL/V)*(exp(-t1*CL/V)-exp(-t1*ka))} PO_cost <- function(p) { mo = data.frame(t1=dfpo$t1,c1=C_po(p,dfpo$t1)); modCost(model=mo, obs = dfpo, x="t1") } p0po <- c(pka=0,aF=0) mpo_FME <- modFit(f=PO_cost,p=p0po) mpo_FME_s <- summary(mpo_FME) ps_po <- mpo_FME_s$par[,"Estimate"]
mpo_FME_s
ka <- exp(ps_po[["pka"]])
F <- 0.5+atan(ps_po[["aF"]])/pi
plot(dfpo)
lines(dfpo$t1,C_po(mpo_FME$par,dfpo\$t1))

Related Set Theory, Logic, Probability, Statistics News on Phys.org
I'm very bored, and I think I have a good way to do this, so give me an hour or two and I'll get back to you with some R code that I think will do what you want.

EDIT: Can you explain your "condition A" in more detail? If the PO data provides information about the parameters CL and V, it would make sense to make use of it when estimating them. Unless there's a very good reason for doing it your way, it would probably be better to fit a composite model, where both models are fit simultaneously, and CL and V are constrained to be identical across models. Alternatively, if you want to "use the variance" of CL and V (which can mean several things), you could fit the IV model, get a posterior distribution for CL and V, and then use it as a prior distribution for CL and V in the PO model.

Last edited:
mfb
Mentor
I'm not sure what OP wants to do is mathematically well-defined.

If you fix parameters, they cannot have an uncertainty in the fit procedure, otherwise you have to allow the fit to deviate from your input values. You can impose a penalty for deviations from the input parameters (an additional term in the expression that is minimized), but then in general your best fit will end up with different values for these parameters, and the deviation depends on the strength of the penalty.

Normally the covariance matrix tells you something about the relation of the variables to the fit quality at the minimum. But if you have to fix two parameters to different values, you are not at the minimum. So what exactly is this covariance matrix supposed to represent?

I'm not sure what OP wants to do is mathematically well-defined.

If you fix parameters, they cannot have an uncertainty in the fit procedure, otherwise you have to allow the fit to deviate from your input values. You can impose a penalty for deviations from the input parameters (an additional term in the expression that is minimized), but then in general your best fit will end up with different values for these parameters, and the deviation depends on the strength of the penalty.

Normally the covariance matrix tells you something about the relation of the variables to the fit quality at the minimum. But if you have to fix two parameters to different values, you are not at the minimum. So what exactly is this covariance matrix supposed to represent?
I was confused about this as well. You can account for uncertainty in the parameters estimated in model one in a few ways (e.g. using their posterior distributions as priors in model two, or marginalizing over their uncertainty in model two), but none of them involve fixing them to exact values. I'd still like to know exactly why the OP wants to fix them in the second model.

As for the covariance matrix, I think the OP wants the inverse negative Hessian of the likelihood function at the maximum, which can be used to construct standard errors for the estimates.

Last edited:
Thank you both for your replies!

I feared that my requirement to fit CL and V to only some of the data would cause trouble. But I'm hopeful, given @Number Nine 's explanation, that it may perhaps be done.

The reason why I'd like to do the fitting this way, if possible, is that the equation for C_IV is much more likely to model the IV data correctly than C_PO is to model the PO data correctly, so I am worried that letting the PO data influence the values of CL and V, via a less accurate model, will 'steer' them the wrong way and 'pollute' the more reliable values obtained from IV.

This is in fact also what people usually do when analysing PK data in the 'traditional' way.
IV data are used to calculate CL and V (and it's not done by direct fitting, but by integration; you can see from the formula that the definite integral of C_IV with respect to t, going from 0 to +inf, is equal to dose_iv/CL; another integral formula is used for V).
Then PO data are used to see if the bioavailability (again, determined by integration, because the same integral as before, but of C_PO, is equal to F*dose_po/CL, so once you know CL, you can calculate F) is in line with the one roughly predicted from CL (F, in special cases, should be close to 1-CL/Q, where Q is a physiological constant that depends on the animal species used). ka can be roughly estimated from the PO data, knowing that the time at which the maximal PO concentration is reached is, for this simple model, LN(ka/(CL/V))/(ka-CL/V). And so on. See for instance http://www.pkquest.com/

C_PO a worse model than C_IV for the data because C_IV contains most of the parameters determining the IV plasma concentration time course, whereas C_PO has heavy approximations and assumptions. The 'real' PO time course does not depend so simply only on the absorption rate constant ka and the bioavailability F. Intestinal solubility should be used in the model, which is also pH-dependent, hence time-dependent as the drug transits through the GI tract. The transit through GI tract could be modelled using several compartments, in each of which the drug can be partially dissolved and partially undissolved, partially ionised and partially non-ionised. F itself is in fact a product of 3 factors, one of which is a function of CL, and the other two depend on many other parameters related to GI transit, ionisation, etc. This kind of modelling is often called 'PBPK', and is based on large systems of differential equations that are solved numerically. It can be done with R's FME and deSolve, and I actually did it, with encouraging results; now however I would like to try out a simpler model for which analytical expressions are available, but with the limitation that the formulae, and especially the one for C_PO, are extremely simplified compared to reality, so extracting ka and F from the PO data using such a model is in essence an attempt to measure broadly averaged parameters and avoid getting lost in excessively complicated models that can only work with very accurate and large datasets.

My colleague had also suggested a 'MCMC' approach. If I got what she meant, different values of CL and V from the first model, with appropriate weights derived from their distribution, would be put into the second model, to see what influence they have on the distribution of ka and F. This is quite beyond my current level of statistical modelling, so although I know that FME has such MCMC capabilities, so far I did not manage to study that in detail or even understand if my case could be handled this way.

The MCMC approach is almost certainly (and the FME documentation confirms this) fitting a Bayesian model. The approach your colleague is suggesting is, I think, similar to what I alluded to in a previous post, where you would derive a posterior distribution for the parameters in the first model, and then marginalize over that posterior (basically, averaging over the uncertainty in the model 1 parameters) when estimating model 2. This would, in general, result in a larger uncertainty in the model 2 parameters when the model 1 parameters are more uncertain. This is fairly easy to do, but is probably not what you want to do.

If the PO model is a bad model, the best course of action is probably to find a better one, but if that's impossible, I still think fitting both models simultaneously is the best approach. It might be worthwhile to include an additional time dependent term in the PO model, just to "soak up" some of the residual effects.

How much data do you have? If you have a fair bit, I might be able to suggest a few ways to account for some of the missing effects in the PO model without making it too much more complex.

mfb
Mentor
You can give one function a larger weight in a simultaneous fit. The result will (in general) depend on this weight, so it might be useful to compare the results for different values of it.

You can give one function a larger weight in a simultaneous fit. The result will (in general) depend on this weight, so it might be useful to compare the results for different values of it.
OP: You can do this pretty easily in R. Just write a function
$$f(CL, V, F, ka) = \alpha E_\text{iv}(CL, V) + (1-\alpha) E_\text{po}(CL, V, F, ka)$$
where $E_\text{iv}$ is a function returning the sum of squared errors for the IV model given parameters CL, V, and $E_\text{po}$ is the same thing for the PO model. The value of alpha (fixed between 0 and 1) controls the weight given to the IV model. Pass this function to one of R's optimizers. I know the optim() function can return an approximate Hessian, which is what you'll need to compute the covariance matrix. Alternately, it's might not be too difficult to derive the Hessian analytically. I would strongly recommend taking the log of the PO model, though, since it's going to be extremely poorly behaved when some of those denominators are small.

Last edited:
This would, in general, result in a larger uncertainty in the model 2 parameters when the model 1 parameters are more uncertain. This is fairly easy to do, but is probably not what you want to do.
Don't I? Sounds exactly like what I wanted to do I had indeed already tried (but with nlm, not with FME) the regression using all 4 parameters. Trouble there is that basically alpha is determined more by the data than by what I decide. I found that I needed to scale the sum of squared residuals in a specific way according to the values of the concentrations and on whether it was the log ratio or the simple difference that I squared and summed. I don't say that from a statistical point of view, just from a practical one: if you don't scale that way, you get a horrible fit for one of the two curves (namely the IV one, if you use for instance alpha=0.5 and the IV residuals are log ratios and the PO residuals are simple differences). It's obvious in a way, because IV and PO concentration are roughly of the same order of magnitude, so applying the log only to IV makes the average of the IV data much smaller compared to the average of the PO data.
I also tried to log the PO data. Not good, I'm afraid, because the initial part of the PO curve and its maximum are badly fitted in that case (even without the confounding effect of trying to regress CL and V at the same time). PO data must be fitted more closely where the concentration is higher, because ka and F usually strongly affect that part of the curve. What happens at a long time after administration is usually more affected by CL and V (because the CL/V ratio is usually smaller than ka, and you can see that the terminal slope of log(C_PO) in that case is -ka).

When I say it's probably not what you want, I mean it's probably a bad idea, and it doesn't really have much statistical justification. All of your problems stem from the fact that the PO model isn't a good model. You don't use it to estimate CL and V because it doesn't provide trustworthy information about those parameters, but then why would you trust the estimates of ka and F? And what do the estimates even mean, if the model is so inaccurate? You're trying to bandage a bad model, but what you need is a good one.

Stephen Tashi
I'm not sure what OP wants to do is mathematically well-defined.

If you fix parameters, they cannot have an uncertainty in the fit procedure, otherwise you have to allow the fit to deviate from your input values.
I haven't studied this thread in detail but its worth noting that many library functions that fit equations to data also output a variance estimate ("uncertainty") for the parameters that are calculated. For example if a linear regression package determines the best fit of y = Ax + B to data is y = 3.2 x + 17 it may output a estimate for the "uncertainty" of the A and B, which some would interpret as an estimate of the "uncertainty" in the 3.2 and 17.

The ways of estimating the variances of the fit parameters involve a lot of assumptions. Generally speaking, we assume the sample means, variances, and covariances (of the data) are accurate estimates of the population parameters for the distribution from which the sample is taken. Then, using a population with that distribution, we find the variability of the best-fit parameters on a selection of random samples taken from that population. (This certainly seems like circular reasoning. There are various ways to justify it.)

Since modern computers are fast, on can take a Monte-Carlo approach by assuming the best fit gives a correct model for the population. We generating batches of pseudo-data using the best-fit equation,, fit an possibly different equation to each bath of pseudo-data, and using the differing values of the parameters obtained on the pseudo-data,. we estimate variance in the fit parameters.

There is also the "linear asymptotic" approach. Approximate the formulae for the best fit parameters by linear functions in the variables representing the data. For example, in a least squares fit of Y = Ax + B to data there are specific formulae for parameters A and B as functions of the data. The linear approximation is chosen to be one that is accurate when the data variables take the values that we actually observed in the sample. (Thus we assume the sample data is a "pretty good" representation of the population.)

Let the function that computes A be approximated as ## A = C_0 + \sum C_i d_i ## then to compute ##Var(A)## we have the familiar problem of computing the variance of a random variable ##A## that is defined as sum of other random variables. By assuming we know the population parameters for the data ##d_i## we have enough "given" information to compute ##Var(A)##.

mfb
Mentor
I haven't studied this thread in detail but its worth noting that many library functions that fit equations to data also output a variance estimate ("uncertainty") for the parameters that are calculated. For example if a linear regression package determines the best fit of y = Ax + B to data is y = 3.2 x + 17 it may output a estimate for the "uncertainty" of the A and B, which some would interpret as an estimate of the "uncertainty" in the 3.2 and 17.
It gives you an uncertainty on A and B because they are fit parameters.
Let it fit y = Ax2 + B and it won't give you an uncertainty on the 2.

When I say it's probably not what you want, I mean it's probably a bad idea, and it doesn't really have much statistical justification. All of your problems stem from the fact that the PO model isn't a good model. You don't use it to estimate CL and V because it doesn't provide trustworthy information about those parameters, but then why would you trust the estimates of ka and F? And what do the estimates even mean, if the model is so inaccurate? You're trying to bandage a bad model, but what you need is a good one.
Sure, I knew you meant that, hence the smiles I do have 'better' PO models, in the sense that they can use more parameters.
One (let's call it 'A') models the GI tract with 3 parameters describing gastric emptying, rate of absorption in the small intestine, small intestine emptying, and assumes complete solubility; for the rest it's a 2-compartment model (central compartment and 'tissue'). It's loosely based on this article https://bmcpharmacoltoxicol.biomedcentral.com/articles/10.1186/2050-6511-14-34. The differential equations for this model can be solved analytically.
Another model (let's call it 'B') is more sophisticated still for the GI tract. It's based on this article https://www.ncbi.nlm.nih.gov/pubmed/18300298, to which I added pH effects on solubility, and for the rest it's a 2-compartment model with the liver kept separate. The differential equations in this case are solved numerically using deSolve (from the same author as FME).
All very nice and much more 'accurate' as far as 'direct' modelling I definitely use those.
Trouble is, the data points we get for a single compound are typically like the ones I put in the OP.
What is the reliability of fitting a model with, say, 8 adjustable parameters to 12 data points (IV+PO data together)? Wouldn't it a bit like trying to fit a line to 3 points?
In fact, FME has functions to evaluate sensitivity and 'identifiability' (via the collinearity) of regression parameters. I tried that, and even model 'A', that has got only 4 parameters for the PO fitting, is apparently ridiculously non-identifiable (a parameter that should be at most 15 turned out to be 47453132.8 with my data!).

But maybe you can suggest a way around the issue...? I'm still hopeful 