Weighting data points with fitted curve in Matlab

Steven Thomas · Aug 1, 2017

Hi all,

I'm currently in the middle of performing an experiment for the final project of my MSc, and I have a question about how I should go about weighting the data when fitting a curve to it using the MATLAB fitting tool.

Firstly, a bit of background about the problem.

I am seeing how low temperature plasmas can be used to dissociate CO₂, Carbon Dioxide, into CO, Carbon Monoxide. In order to see how much CO there is after the plasma I use an FTIR and absorption spectroscopy to measure the area of part of the spectra. I am currently trying to rerun the CO calibration.

To do this, I pass known admixtures of Argon and CO, then measure the area. So I have ten values of CO percentage, 0.1% to 1.0 percent in 0.1% increments, and I have 10 corresponding areas, each having been calculated from the mean of 10 readings at each admixture, so I also have a corresponding standard deviation (σ) and standard error (SE) for each admixture.

Normally I would just fit a function with the admixture along the x-axis and the area on the y axis, weighting each point with either 1/σ² or 1/SE². However, as in my experiment I will be recording areas of CO curves and want a corresponding percentage out, it is better for me to fit the data the other way around, with percentage along the y-axis and area on the x axis. With the data this way round I am fitting the function f(x) = a*x² + b*x.

Without specifying any weights to my data points, I get the fit below:

with error on the coefficients as: a ±10.8%, and b ±13.4%.

With 1/σ² as the weights I get:

and with 1/SE² I get:

both with error on the coefficients as a ±12.8%, and b ±16.5%. Interesting that the function it returns is the same, for the last two cases, with the same R-square and adjusted R-square values, but with different SSE and RMSE values. The SSE is a factor of 10 different and the RMSE is a factor of √10 different, which I assume is due to the 10 readings used for the means taken into account when using the standard error over the standard deviation. What I am a little puzzled about is why the fit is now worse when I've specified which data points it should more carefully plot to.

So finally I come to asking my question - why? Have I misinterpreted how to use the weighting function in matlab, and should I have lower numbers as the ones I want to fit more carefully to?

Or is it because I have flipped the axis round on my data, so if I was just to plot the data with error bars, they would now be horizontal rather than vertical?

I am suspecting the second to be true, as I have tried fitting the function using just σ² and SE², and I get the following respectively:

again, both with the same values of the coefficients with errors a ±9.4%, and b ±10.7%. So the errors are lower, the SSE and RMSE are lower, and the R-square values are closer to 1 too.

Which way should I be doing it? Or have I got this all wrong and shouldn't be using absolute standard errors to weight with, but instead should be using percentage standard errors?

I know this is a bit of a long post, so if you've read this far I really thank you, but I've going through this for a couple of days now and would like some advice from someone better at statistics than I am.

Thanks in advance,

Steve.

RUber · Aug 2, 2017

I have not used weights in Matlab before, but I think there is a need to normalize them.
Using 1/SE seems to make the most sense, since SE is the standard error of the mean, which is what you are trying to plot.
To normalize, I would recommend something like
## w = \frac{1}{1+|\overline{SE} - SE|} ## where ##\overline{SE}## is your mean of the standard errors.
Another option would be to use
##S = \frac{|\overline{SE} - SE|}{\overline{SE}}##
And let weight be
## w = \frac{1}{1+S} ##
The effect of weights like these would be that if all your SE terms were the same, you would have all the weights be equal to 1.
Depending on the scale of your original SE, relative weights could vary a lot by just using 1/SE.
Hope this helps.

Steven Thomas · Aug 2, 2017

Hi @RUber
Thanks for your reply.

Ihave tried your method and it gives me much better error percentages on the coefficients and it also reduces the SSE and RMSE and moves the Rsquare and adjusted Rsquare closer to 1, so that's good.

I've since also tried using 1/percentage error squared, where the percentage error is the standard error / value of the data point * 100. This gives me even better errors on the coefficients, but not as good a SSE RMSE Rsquare and adjusted R square as your methods do.

I really don't know, this is why I hate stats stuff, there's no correct way to do it, but the different methods will give me completely different CO percentages and yields in my experiment.

Can I ask you about the fact I have switched my axis around so that I'm fitting a function that would now have the error bars horizontal as aposed to vertical? Do you think this means I need to change the way I would usually pick my weights for the fit?

Steven.

RUber · Aug 2, 2017

Steven Thomas said:

I really don't know, this is why I hate stats stuff, there's no correct way to do it, but the different methods will give me completely different CO percentages and yields in my experiment.
Steven.

Fitting curves really is a black art. You can apply some standard methods, but when you want to apply some additional finesse, everyone will tell you it depends, so there are few authoritative standards.
By definition, the un-weighted model will minimize your un-weighted error. By changing the weights, you are allowing for more error around points with more variability and hoping for less error around points with smaller variability. I think this is a smart application of the weighted model. Choosing the weights should be done in a way that helps. It seems like your goal is to still have a pretty good fit overall, so you do not want your weights to be too far out of proportion.

Depending on your data, I would try to keep the weights between 0.5 and 1.5, so that the underweighted points aren't completely disregarded by the model.
For comparison, if one point is weighted 1.5 and another has weight of 0.5, your model would be willing to trade up to 3 units of error around the lighter point in favor of 1 less unit error on the heavier point. If you have weights ranging from 0.1 to 10, then imagine how much total error you are allowing around the "unimportant" points, maybe to get just a fraction closer to your important points.

Steven Thomas said:

Can I ask you about the fact I have switched my axis around so that I'm fitting a function that would now have the error bars horizontal as aposed to vertical? Do you think this means I need to change the way I would usually pick my weights for the fit?

I don't think that should matter.

Steven Thomas · Aug 2, 2017

That's a really clear and coherent answer thank you. I think I now know how I'll proceed. I'd like to quote it almost in my lab book as a rational for my choice, did you get that from any particular book or are you a lecturer or anything that I can put down?

RUber · Aug 2, 2017

You can quote my username and PF as the source. I am currently an Asst. Prof of Mathematics.

StandardsApplied · Apr 10, 2024

Adding weight to your data isn't always the solution. In this case, the residual show a clear trend line. What you want to see in a good fit is a residual plot that is just pure noise, no even curves like that. What your residual plot is telling you is that there is more information still in your data. That information isn't being captured in your fit equation.

To find an equation that better fits your data, try our nonlinear curve fitting tool over at: https://www.standardsapplied.com/nonlinear-curve-fitting-calculator.html
It will try fitting all 74 bult-in functions to your data and return a ranked list of the best functions, plotting the best.

Weighting data points with fitted curve in Matlab

Similar threads

High School Ant on a stretchy rope puzzle

High School Potato paradox

Geometric Game: Fun With Matches (Safe!)

Undergrad Three Circle Problem

High School Three Squares Problem

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect