Weighting data points with fitted curve in Matlab

In summary: I'm not sure what the difference is, but it may be worth trying both methods and seeing which one gives you the best results.In summary, using weights in Matlab to normalize data can improve the accuracy of the fit, but it depends on the scale of the original standard errors.
  • #1
Steven Thomas
16
0
Hi all,

I'm currently in the middle of performing an experiment for the final project of my MSc, and I have a question about how I should go about weighting the data when fitting a curve to it using the MATLAB fitting tool.

Firstly, a bit of background about the problem.

I am seeing how low temperature plasmas can be used to dissociate CO2, Carbon Dioxide, into CO, Carbon Monoxide. In order to see how much CO there is after the plasma I use an FTIR and absorption spectroscopy to measure the area of part of the spectra. I am currently trying to rerun the CO calibration.

To do this, I pass known admixtures of Argon and CO, then measure the area. So I have ten values of CO percentage, 0.1% to 1.0 percent in 0.1% increments, and I have 10 corresponding areas, each having been calculated from the mean of 10 readings at each admixture, so I also have a corresponding standard deviation (σ) and standard error (SE) for each admixture.

Normally I would just fit a function with the admixture along the x-axis and the area on the y axis, weighting each point with either 1/σ2 or 1/SE2. However, as in my experiment I will be recording areas of CO curves and want a corresponding percentage out, it is better for me to fit the data the other way around, with percentage along the y-axis and area on the x axis. With the data this way round I am fitting the function f(x) = a*x2 + b*x.

Without specifying any weights to my data points, I get the fit below:
1.png

with error on the coefficients as: a ±10.8%, and b ±13.4%.

With 1/σ2 as the weights I get:
2.png

and with 1/SE2 I get:
3.png

both with error on the coefficients as a ±12.8%, and b ±16.5%. Interesting that the function it returns is the same, for the last two cases, with the same R-square and adjusted R-square values, but with different SSE and RMSE values. The SSE is a factor of 10 different and the RMSE is a factor of √10 different, which I assume is due to the 10 readings used for the means taken into account when using the standard error over the standard deviation. What I am a little puzzled about is why the fit is now worse when I've specified which data points it should more carefully plot to.

So finally I come to asking my question - why? Have I misinterpreted how to use the weighting function in matlab, and should I have lower numbers as the ones I want to fit more carefully to?

Or is it because I have flipped the axis round on my data, so if I was just to plot the data with error bars, they would now be horizontal rather than vertical?

I am suspecting the second to be true, as I have tried fitting the function using just σ2 and SE2, and I get the following respectively:
4.png

5.png

again, both with the same values of the coefficients with errors a ±9.4%, and b ±10.7%. So the errors are lower, the SSE and RMSE are lower, and the R-square values are closer to 1 too.

Which way should I be doing it? Or have I got this all wrong and shouldn't be using absolute standard errors to weight with, but instead should be using percentage standard errors?

I know this is a bit of a long post, so if you've read this far I really thank you, but I've going through this for a couple of days now and would like some advice from someone better at statistics than I am.

Thanks in advance,

Steve.
 
Physics news on Phys.org
  • #2
I have not used weights in Matlab before, but I think there is a need to normalize them.
Using 1/SE seems to make the most sense, since SE is the standard error of the mean, which is what you are trying to plot.
To normalize, I would recommend something like
## w = \frac{1}{1+|\overline{SE} - SE|} ## where ##\overline{SE}## is your mean of the standard errors.
Another option would be to use
##S = \frac{|\overline{SE} - SE|}{\overline{SE}}##
And let weight be
## w = \frac{1}{1+S} ##
The effect of weights like these would be that if all your SE terms were the same, you would have all the weights be equal to 1.
Depending on the scale of your original SE, relative weights could vary a lot by just using 1/SE.
Hope this helps.
 
  • #3
Hi @RUber
Thanks for your reply.

Ihave tried your method and it gives me much better error percentages on the coefficients and it also reduces the SSE and RMSE and moves the Rsquare and adjusted Rsquare closer to 1, so that's good.

I've since also tried using 1/percentage error squared, where the percentage error is the standard error / value of the data point * 100. This gives me even better errors on the coefficients, but not as good a SSE RMSE Rsquare and adjusted R square as your methods do.

I really don't know, this is why I hate stats stuff, there's no correct way to do it, but the different methods will give me completely different CO percentages and yields in my experiment.

Can I ask you about the fact I have switched my axis around so that I'm fitting a function that would now have the error bars horizontal as aposed to vertical? Do you think this means I need to change the way I would usually pick my weights for the fit?

Steven.
 
  • #4
Steven Thomas said:
I really don't know, this is why I hate stats stuff, there's no correct way to do it, but the different methods will give me completely different CO percentages and yields in my experiment.
Steven.
Fitting curves really is a black art. You can apply some standard methods, but when you want to apply some additional finesse, everyone will tell you it depends, so there are few authoritative standards.
By definition, the un-weighted model will minimize your un-weighted error. By changing the weights, you are allowing for more error around points with more variability and hoping for less error around points with smaller variability. I think this is a smart application of the weighted model. Choosing the weights should be done in a way that helps. It seems like your goal is to still have a pretty good fit overall, so you do not want your weights to be too far out of proportion.

Depending on your data, I would try to keep the weights between 0.5 and 1.5, so that the underweighted points aren't completely disregarded by the model.
For comparison, if one point is weighted 1.5 and another has weight of 0.5, your model would be willing to trade up to 3 units of error around the lighter point in favor of 1 less unit error on the heavier point. If you have weights ranging from 0.1 to 10, then imagine how much total error you are allowing around the "unimportant" points, maybe to get just a fraction closer to your important points.

Steven Thomas said:
Can I ask you about the fact I have switched my axis around so that I'm fitting a function that would now have the error bars horizontal as aposed to vertical? Do you think this means I need to change the way I would usually pick my weights for the fit?
I don't think that should matter.
 
  • #5
That's a really clear and coherent answer thank you. I think I now know how I'll proceed. I'd like to quote it almost in my lab book as a rational for my choice, did you get that from any particular book or are you a lecturer or anything that I can put down?
 
  • #6
You can quote my username and PF as the source. I am currently an Asst. Prof of Mathematics.
 
  • #7
Adding weight to your data isn't always the solution. In this case, the residual show a clear trend line. What you want to see in a good fit is a residual plot that is just pure noise, no even curves like that. What your residual plot is telling you is that there is more information still in your data. That information isn't being captured in your fit equation.

To find an equation that better fits your data, try our nonlinear curve fitting tool over at: https://www.standardsapplied.com/nonlinear-curve-fitting-calculator.html
It will try fitting all 74 bult-in functions to your data and return a ranked list of the best functions, plotting the best.
 

1. How does weighting data points affect the fitted curve in Matlab?

The weights assigned to data points in Matlab affect how much each point contributes to the overall shape of the fitted curve. Points with higher weights will have a greater influence on the curve, while points with lower weights will have less influence. This can be useful in cases where certain data points are more reliable or important than others.

2. How do I assign weights to data points in Matlab?

To assign weights to data points in Matlab, you can use the weights parameter in the fit function. This parameter takes in a vector of weights, with each weight corresponding to a data point. Alternatively, you can also use the WeightedData property in the fitoptions function to assign weights to data points.

3. Can I use different weighting schemes for different data points in Matlab?

Yes, you can assign different weights to different data points in Matlab. This can be done by specifying a vector of weights with the same length as the data points, where each weight corresponds to a specific data point. Alternatively, you can also use the WeightedData property in the fitoptions function to specify weights for each data point.

4. How do I know if the weights assigned to data points in Matlab are appropriate?

To determine if the weights assigned to data points in Matlab are appropriate, you can plot the fitted curve with and without the weights and compare the two curves. If the weighted curve better fits the data, then the weights are likely appropriate. Additionally, you can also check the rmse (root mean squared error) value for both curves, where a lower value indicates a better fit.

5. Can I use a custom weighting function in Matlab?

Yes, you can use a custom weighting function in Matlab by using the WeightFcn property in the fitoptions function. This allows you to specify a function that takes in the data and returns a vector of weights for each data point. This can be useful for cases where a simple weight vector is not sufficient.

Similar threads

  • MATLAB, Maple, Mathematica, LaTeX
Replies
9
Views
1K
  • MATLAB, Maple, Mathematica, LaTeX
Replies
2
Views
2K
  • MATLAB, Maple, Mathematica, LaTeX
Replies
14
Views
2K
  • MATLAB, Maple, Mathematica, LaTeX
Replies
12
Views
3K
  • MATLAB, Maple, Mathematica, LaTeX
Replies
5
Views
2K
  • MATLAB, Maple, Mathematica, LaTeX
Replies
1
Views
1K
  • MATLAB, Maple, Mathematica, LaTeX
Replies
1
Views
2K
  • MATLAB, Maple, Mathematica, LaTeX
Replies
2
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
1K
  • MATLAB, Maple, Mathematica, LaTeX
Replies
8
Views
1K
Back
Top