Extrapolating data points using models

Click For Summary
The discussion focuses on extrapolating data from a laser's current versus optical power measurements, specifically predicting output power at 9 amps. It highlights the challenge of balancing underfitting and overfitting when using polynomial models, noting that higher-order polynomials fit the data better but may not be ideal for extrapolation. Participants emphasize the importance of graphical residual analysis and the reduced chi-squared test to evaluate model fit, cautioning that a high R² value does not guarantee a good model. They suggest that systematic deviations in residuals indicate the need for a non-linear model, despite potential theoretical expectations of linearity. Ultimately, the consensus leans towards using the higher-order model for better extrapolation, while also considering the experimental context.
roam
Messages
1,265
Reaction score
12

Homework Statement


I have made a number of measurements of current against optical power for a given laser. As shown below, my measurements only go up to 8 amps. I am trying to use the data to predict the output power at 9 amps.

JpLDtLA.png


In the ideal case, the behaviour is expected to be linear, but here higher order polynomials fit the data better.

I would like to know if there is a way to find a proper balance between underfitting and overfitting these data. Also, I want to know if there are better methods to extrapolate this data point.

Homework Equations



The Attempt at a Solution



Clearly, the two models give different predictions of what the power would be at 9 amps (the difference being ~ 600 mW).

Here are the corresponding r2 values for the various fittings:

$$
\begin{array}{c|c}
\text{degree} & r^{2}\\
\hline 1 & 0.9977\\
2 & 0.9998\\
3 & 1.0000\\
4 & 1.0000
\end{array}
$$

Is it possible to decide which model to use based on these values? Can you determine if the flexibility of the model is too high so that it's modeling noise? :confused:

Any suggestions is greatly appreciated.
 

Attachments

  • JpLDtLA.png
    JpLDtLA.png
    4.9 KB · Views: 695
Physics news on Phys.org
roam said:
Is it possible to decide which model to use based on these values?
Yes. Hold the plot horizontal and look 'along the line'. The deviation from a straight line is clearly systematic.
A measure of this is the reduced chi square = chi square/degrees of freedom.
link from this thread said:
##\ ## Stephen Tashi
In your case it should reduce sharply from linear to quadratic and not much from 2nd to 3rd order.
I'm not so familiar with ##R^2## -- except that it comes with excel fits :wink:. But I suppose the improvement from quadratic to 3rd order shows that the latter is not worth it.

[edit] google is our friend
 
Last edited:
  • Like
Likes roam
roam said:
Is it possible to decide which model to use based on these values?

A high R2 value does not guarantee that the model fits the data well. As remarked by @BvU: Use your eyes to look 'along the line' or perform a graphical residual analysis to check whether the data-point deviations are randomly distributed around the fitted curve.
[PDF]
Curve Fitting Made Easy
 
  • Like
Likes roam and SammyS
Hi @BvU and @Lord Jestocost,

I have a few follow-up questions. Here is a plot of my residuals:

tMNtmn5.png


The blue line shows the deviations from the straight line (linear fit). The residuals for quadratic and cubic also appear to be non-random, what does this mean?

Regarding the reduced chi-squared test, as I understand the smaller the value of ##\chi^{2}/\text{degrees of freedom}##, the better the fitting is. But if the improvement from one model to the next is small, then we should say with the current model?

To calculate this I need to find the number of degrees of freedom for this data set. the reference in BvU's post gives this definition:

$$\text{Number of data points} - \text{Number of parameters calculated from the data points}$$

I've got 8 data points. What would be the "number of parameters calculated from the data points"? :confused:
 

Attachments

  • tMNtmn5.png
    tMNtmn5.png
    24.7 KB · Views: 577
roam said:
quadratic and cubic also appear to be non-random
There is a clear 2nd order term in the residuals for the linear fit. What non-random behaviour do you see in the other two ?
roam said:
What would be the "number of parameters calculated from the data points"?
For an average that is 1, for a straight line 2, for a parabola 3, etc.
You have 8 data points, so you could exactly calculate a seventh order polynomial through all points: zero degrees of freedom. But then you basically modeled the noise, not the actual behaviour. In addition, that 'model' it extremely useless for extrapolation.

roam said:
But if the improvement from one model to the next is small, then we should stay with the current model?
Yes. The ##\chi^2/N## has a distribution that depends on N
redchidensity.jpg

(picture https://www.chem.purdue.edu/courses/chm621/text/stat/funcs/sampling/sampling.htm n = 3,5,10,20 shown)
With higher N it becomes sharper and more symmetric around 1. In other words: a deviation from 1 becomes more and more unlikely.

Read up a bit on that until you understand a phrase like :
The area under the reduced chi squared distribution, from the ##\chi^2_R## found, to ##\infty## is the probability you would find a higher ##\chi^2_R## if you would repeat the experiment.

Remember though, that this is statistics -- for an experimentalist the physics takes precedence.

Note to self: I omit a treatise on internal/external errors which may be essential for the ##\int_{\chi^2}^\infty## phrase
 

Attachments

  • redchidensity.jpg
    redchidensity.jpg
    27.7 KB · Views: 560
  • Like
Likes roam and Lord Jestocost
I would be interested to see the error bars on your data points.
 
  • Like
Likes roam
Because the deviations from the linear model are so systematic, they do not look like random errors to me. The regression models should statistically support the inclusion of the non-linear term. IMHO, you should use the higher-order model. That being said, if the theory strongly suggests a linear relationship, then you should ask yourself if there may be something about your experiment or measurement methods that are introducing the non-linear term. Even if that is true, the best extrapolation of the entire experiment and measurement process is the non-linear model.
 
  • Like
Likes roam

Similar threads

  • · Replies 7 ·
Replies
7
Views
2K
  • · Replies 20 ·
Replies
20
Views
3K
Replies
2
Views
2K
Replies
2
Views
3K
Replies
1
Views
3K
  • · Replies 16 ·
Replies
16
Views
2K
  • · Replies 5 ·
Replies
5
Views
2K
  • · Replies 5 ·
Replies
5
Views
4K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 26 ·
Replies
26
Views
3K