# Fitting a curve to data with error bars on curve parameters

## Main Question or Discussion Point

I have several data points with error bars, and these error bars are different sizes for each of the data points. I'd like to fit a model function to them which has non-linear parameters, and be able to get error bars on the model parameters, ie. if my model is something like $f(x) = A + B(t-t_c)^\alpha$, I want to be able to get error bars on A, B, $t_c$ and $\alpha$.

I was thinking of using a Monte Carlo algorithm, but this seems like it might be unnecessarily slow. Unless someone has pointers on how to make good guesses of the model parameters for the Monte Carlo to reduce the amount of rejected moves. I've done this sort of thing with discrete variables before, but not continuous ones, so I'm a little confused on how to make good MC moves for a continuous variable when you don't know a priori what level of precision you need in the variable.

Related Set Theory, Logic, Probability, Statistics News on Phys.org
Stephen Tashi
I'm a little confused on how to make good MC moves for a continuous variable when you don't know a priori what level of precision you need in the variable.
What do you mean by a "move"?

A move in Monte Carlo is a decision about how to change your parameters. For example, in the Hirsch-Fye algorithm, you have a set of Ising spins which can take on values of +1 or -1. So the simplest move in HF is easy, just choose a spin at random and flip the sign. Once you've decided on a move to make, you compute the ratio of the partition function before and after the move and use that to compute a probability that you use to accept or reject the move. If you accept it, you update your Ising spins and go do your next iteration. If not, you discard the move and go to the next iteration.

For continuous variables, it's not so easy. If the proper result has a very small error bar for a particular parameter, then you might have a situation where, say, only moves of +/- 0.001 have a decent chance of being accepted, so if you're making moves on the order of 0.1 then you will never have a move which gets accepted. On the other hand, you might have a situation where, for a particular parameter, your current state is 0.5 away from the "correct" value, and if you're making moves of 0.001 it will take a long time to make it there.

Stephen Tashi
A move in Monte Carlo is a decision about how to change your parameters. For example, in the Hirsch-Fye algorithm,
I'd think the straightforward way to Monte-Carlo a distribution of the parameters of the curve fit would be to make independent random selections for each data point using whatever distribution you are assuming the measurements to have - for example, a normal distribution with mean equal to the center of the interval and a standard deviation equal to the half-width of the interval. Then fit the curve using whatever method of fitting you are using. That gives one realization of the parameters of the fit.

How are you doing the curve fit?

Hmm I see what you mean. That might work. I'll have to think about that.

I haven't written code to do the curve fitting yet. Since the model functions are nonlinear, it'll be an iterative method, maybe steepest decent, although I'm not sure I want to be calculating gradients.

Quick and dirty, you can do a weighted least squares fit, where the weighting is inverse to the noise on the individual observations, and then estimate the uncertainty on the fit coefficients by dropping random points from the data set and seeing how the fit moves.

I would think the result would be similar to the monte-carlo method described by Stephen Tashi.

If you don't know anything about the distribution of the parameters, just bootstrap it.

ParticleGrl has a great suggestion.

I'd add that, to leading order, your parameters will have a multivariate Gaussian distribution around their optimal values. Once you find the best fit, compute the Hessian matrix: basically, this is related to the set of all variances and covariances of your parameters.

(I cannot recall how to compute the Hessian at the moment, but let me just assume that you can for now. :-) )

Once you have the Hessian, you can directly generate independent draws of parameter values. You can then reweight them by the ratio of their actual probability to the Gaussian probability. This way, you'll never reject any draws, and you don't have to worry about mixing or tuning your jumps.

You are trying to measure uncertainty in your parameters. This cries out for a Bayesian analysis.

In more detail:

What you have is a likelihood: $P(\text{data} | \text{params})$. (If you knew the true values of $\left\{A, B, t_c, \alpha\right\}$, you could compute the probability of seeing any particular data values, including the ones you actually saw; this is where your individual error bars come in.)

What you want is a posterior: $P(\text{params} | \text{data})$. (You wish you could compute the probability (density) for any arbitrary values of $\left\{A, B, t_c, \alpha\right\}$, given the data which you actually saw. If you had $P(\text{params} | \text{data})$, you could compute any desired quantity from it: the "best" parameter values, uncertainty ranges for each parameter, correlations between parameters, etc.)

What you need is Bayes' rule:
$$P(\text{params} | \text{data}) \propto P(\text{data} | \text{params}) P(\text{params}).$$
This is a fundamental formula from probability theory. It means you need one more quantity, the prior: $P(\text{params})$. For any given set of parameter values, how plausible are they a priori? You will need to explain and defend your choice here (some choices more than others).

Once you have all these ingredients, here is how you apply them:
1. Find the parameters $\hat{p} \equiv \left\{\hat{A}, \hat{B}, \hat{t_c}, \hat{\alpha}\right\}$ which maximize the posterior, $P(\text{params} | \text{data})$. Using least squares is a good idea.
• Your prior, $P(\text{params})$, will show up as penalty terms (which might be zero depending on the choice of prior).
2. Find the Hessian (the second derivative matrix) of $P(\text{params} | \text{data})$ in the neighbourhood of $\hat{p}$.
• Note that all the first derivatives are 0, since $\hat{p}$ is a local optimum!
• If you quit here, you're already doing pretty well! But if you want to continue...
3. Take independent Monte Carlo draws from the Gaussian approximation to your posterior, and weight each draw by a factor of $\frac{P(\text{params} | \text{data})}{P_\text{Gauss}(\text{params})}$. Now you can take weighted averages over your Monte Carlo draws to compute whatever statistical properties your heart desires.
• This is probably a small correction on step 2. However, you might find that your parameters are highly non-Gaussian in this neighbourhood; and you'd be glad you checked.

In case you're interested, I wrote a paper doing basically this kind of analysis a few years ago. I explained my priors and likelihood; ran Monte Carlo; and presented the resulting uncertainty.

(I didn't take a Gaussian approximation to the posterior. That might have been easier than the Markov Chain I did use, but my stats knowledge wasn't the best at the time.)

Here is the issue it appeared in (together with discussion of the paper), and here is a direct link to the PDF. The model building happens in Section 3.

That's interesting stuff, I'll look at it later when I have more time.

As I examine this problem in greater detail, I'm finding is that there are multiple local minima which provide pretty good fits, one could probably not argue that P(params) is much different for one choice than the other. So this turns out to mean that the probability distribution for the fit parameters will be bimodal. I don't really care about A and B, but for Tc and alpha knowing where the best set of (2,3,5?) local minima are is probably more important than error bars.

Currently I'm using a genetic algorithm to solve the equation. This is great because it's pretty good at not getting stuck in local minima but now I don't know if it will be good because I need information about the local minima.

In that case, I'd suggest finding the local minima by whatever method doesn't get stuck in them (e.g., genetic algorithm, as you're using now). Then, once you have them -- however you got them -- analyze each individually.

Also, FYI, there is a technical term for parameters you don't care about: "nuisance parameters". You can just integrate over them to get a distribution which only focuses on the others (a so-called "marginal distribution").