Extrapolating data points using models

Click For Summary

Homework Help Overview

The discussion revolves around extrapolating data points from measurements of current against optical power for a laser, specifically predicting output power at 9 amps based on existing data that only goes up to 8 amps. The original poster notes that while linear behavior is expected, higher order polynomials appear to fit the data better, raising questions about the balance between underfitting and overfitting.

Discussion Character

  • Exploratory, Assumption checking, Conceptual clarification

Approaches and Questions Raised

  • Participants discuss the implications of different polynomial fits and their corresponding r² values, questioning how to decide which model to use. There are inquiries about the significance of residuals and the reduced chi-squared test in assessing model fit. Some participants suggest graphical analysis and the importance of examining residuals for randomness.

Discussion Status

The discussion is active, with participants offering insights into model evaluation techniques and the interpretation of statistical measures. There is a recognition of the need to consider both statistical fit and theoretical expectations, particularly regarding the potential influence of experimental methods on the observed non-linear behavior.

Contextual Notes

Participants note the constraints of having only 8 data points and the implications for degrees of freedom in model fitting. There is also mention of the systematic deviations observed in the residuals, which raises questions about the nature of errors in the measurements.

roam
Messages
1,265
Reaction score
12

Homework Statement


I have made a number of measurements of current against optical power for a given laser. As shown below, my measurements only go up to 8 amps. I am trying to use the data to predict the output power at 9 amps.

JpLDtLA.png


In the ideal case, the behaviour is expected to be linear, but here higher order polynomials fit the data better.

I would like to know if there is a way to find a proper balance between underfitting and overfitting these data. Also, I want to know if there are better methods to extrapolate this data point.

Homework Equations



The Attempt at a Solution



Clearly, the two models give different predictions of what the power would be at 9 amps (the difference being ~ 600 mW).

Here are the corresponding r2 values for the various fittings:

$$
\begin{array}{c|c}
\text{degree} & r^{2}\\
\hline 1 & 0.9977\\
2 & 0.9998\\
3 & 1.0000\\
4 & 1.0000
\end{array}
$$

Is it possible to decide which model to use based on these values? Can you determine if the flexibility of the model is too high so that it's modeling noise? :confused:

Any suggestions is greatly appreciated.
 

Attachments

  • JpLDtLA.png
    JpLDtLA.png
    4.9 KB · Views: 716
Physics news on Phys.org
roam said:
Is it possible to decide which model to use based on these values?
Yes. Hold the plot horizontal and look 'along the line'. The deviation from a straight line is clearly systematic.
A measure of this is the reduced chi square = chi square/degrees of freedom.
link from this thread said:
##\ ## Stephen Tashi
In your case it should reduce sharply from linear to quadratic and not much from 2nd to 3rd order.
I'm not so familiar with ##R^2## -- except that it comes with excel fits :wink:. But I suppose the improvement from quadratic to 3rd order shows that the latter is not worth it.

[edit] google is our friend
 
Last edited:
  • Like
Likes   Reactions: roam
roam said:
Is it possible to decide which model to use based on these values?

A high R2 value does not guarantee that the model fits the data well. As remarked by @BvU: Use your eyes to look 'along the line' or perform a graphical residual analysis to check whether the data-point deviations are randomly distributed around the fitted curve.
[PDF]
Curve Fitting Made Easy
 
  • Like
Likes   Reactions: roam and SammyS
Hi @BvU and @Lord Jestocost,

I have a few follow-up questions. Here is a plot of my residuals:

tMNtmn5.png


The blue line shows the deviations from the straight line (linear fit). The residuals for quadratic and cubic also appear to be non-random, what does this mean?

Regarding the reduced chi-squared test, as I understand the smaller the value of ##\chi^{2}/\text{degrees of freedom}##, the better the fitting is. But if the improvement from one model to the next is small, then we should say with the current model?

To calculate this I need to find the number of degrees of freedom for this data set. the reference in BvU's post gives this definition:

$$\text{Number of data points} - \text{Number of parameters calculated from the data points}$$

I've got 8 data points. What would be the "number of parameters calculated from the data points"? :confused:
 

Attachments

  • tMNtmn5.png
    tMNtmn5.png
    24.7 KB · Views: 596
roam said:
quadratic and cubic also appear to be non-random
There is a clear 2nd order term in the residuals for the linear fit. What non-random behaviour do you see in the other two ?
roam said:
What would be the "number of parameters calculated from the data points"?
For an average that is 1, for a straight line 2, for a parabola 3, etc.
You have 8 data points, so you could exactly calculate a seventh order polynomial through all points: zero degrees of freedom. But then you basically modeled the noise, not the actual behaviour. In addition, that 'model' it extremely useless for extrapolation.

roam said:
But if the improvement from one model to the next is small, then we should stay with the current model?
Yes. The ##\chi^2/N## has a distribution that depends on N
redchidensity.jpg

(picture https://www.chem.purdue.edu/courses/chm621/text/stat/funcs/sampling/sampling.htm n = 3,5,10,20 shown)
With higher N it becomes sharper and more symmetric around 1. In other words: a deviation from 1 becomes more and more unlikely.

Read up a bit on that until you understand a phrase like :
The area under the reduced chi squared distribution, from the ##\chi^2_R## found, to ##\infty## is the probability you would find a higher ##\chi^2_R## if you would repeat the experiment.

Remember though, that this is statistics -- for an experimentalist the physics takes precedence.

Note to self: I omit a treatise on internal/external errors which may be essential for the ##\int_{\chi^2}^\infty## phrase
 

Attachments

  • redchidensity.jpg
    redchidensity.jpg
    27.7 KB · Views: 572
  • Like
Likes   Reactions: roam and Lord Jestocost
I would be interested to see the error bars on your data points.
 
  • Like
Likes   Reactions: roam
Because the deviations from the linear model are so systematic, they do not look like random errors to me. The regression models should statistically support the inclusion of the non-linear term. IMHO, you should use the higher-order model. That being said, if the theory strongly suggests a linear relationship, then you should ask yourself if there may be something about your experiment or measurement methods that are introducing the non-linear term. Even if that is true, the best extrapolation of the entire experiment and measurement process is the non-linear model.
 
  • Like
Likes   Reactions: roam

Similar threads

  • · Replies 7 ·
Replies
7
Views
2K
  • · Replies 20 ·
Replies
20
Views
3K
Replies
2
Views
2K
Replies
2
Views
3K
Replies
1
Views
3K
  • · Replies 16 ·
Replies
16
Views
2K
  • · Replies 5 ·
Replies
5
Views
2K
  • · Replies 5 ·
Replies
5
Views
4K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 26 ·
Replies
26
Views
3K