Extrapolating data points using models

roam · Mar 5, 2019

Homework Statement

I have made a number of measurements of current against optical power for a given laser. As shown below, my measurements only go up to 8 amps. I am trying to use the data to predict the output power at 9 amps.

In the ideal case, the behaviour is expected to be linear, but here higher order polynomials fit the data better.

I would like to know if there is a way to find a proper balance between underfitting and overfitting these data. Also, I want to know if there are better methods to extrapolate this data point.

Homework Equations

The Attempt at a Solution

Clearly, the two models give different predictions of what the power would be at 9 amps (the difference being ~ 600 mW).

Here are the corresponding r² values for the various fittings:

$$
\begin{array}{c|c}
\text{degree} & r^{2}\\
\hline 1 & 0.9977\\
2 & 0.9998\\
3 & 1.0000\\
4 & 1.0000
\end{array}
$$

Is it possible to decide which model to use based on these values? Can you determine if the flexibility of the model is too high so that it's modeling noise?

Any suggestions is greatly appreciated.

BvU · Mar 5, 2019

roam said:

Is it possible to decide which model to use based on these values?

Yes. Hold the plot horizontal and look 'along the line'. The deviation from a straight line is clearly systematic.
A measure of this is the reduced chi square = chi square/degrees of freedom.

link from this thread said:

##\ ## Stephen Tashi

In your case it should reduce sharply from linear to quadratic and not much from 2nd to 3rd order.
I'm not so familiar with ##R^2## -- except that it comes with excel fits

. But I suppose the improvement from quadratic to 3rd order shows that the latter is not worth it.

[edit] google is our friend

Lord Jestocost · Mar 5, 2019

roam said:

Is it possible to decide which model to use based on these values?

A high R² value does not guarantee that the model fits the data well. As remarked by @BvU: Use your eyes to look 'along the line' or perform a graphical residual analysis to check whether the data-point deviations are randomly distributed around the fitted curve.
[PDF]
Curve Fitting Made Easy

roam · Mar 10, 2019

Hi @BvU and @Lord Jestocost,

I have a few follow-up questions. Here is a plot of my residuals:

The blue line shows the deviations from the straight line (linear fit). The residuals for quadratic and cubic also appear to be non-random, what does this mean?

Regarding the reduced chi-squared test, as I understand the smaller the value of ##\chi^{2}/\text{degrees of freedom}##, the better the fitting is. But if the improvement from one model to the next is small, then we should say with the current model?

To calculate this I need to find the number of degrees of freedom for this data set. the reference in BvU's post gives this definition:

$$\text{Number of data points} - \text{Number of parameters calculated from the data points}$$

I've got 8 data points. What would be the "number of parameters calculated from the data points"?

BvU · Mar 10, 2019

roam said:

quadratic and cubic also appear to be non-random

There is a clear 2nd order term in the residuals for the linear fit. What non-random behaviour do you see in the other two ?

roam said:

What would be the "number of parameters calculated from the data points"?

For an average that is 1, for a straight line 2, for a parabola 3, etc.
You have 8 data points, so you could exactly calculate a seventh order polynomial through all points: zero degrees of freedom. But then you basically modeled the noise, not the actual behaviour. In addition, that 'model' it extremely useless for extrapolation.

roam said:

But if the improvement from one model to the next is small, then we should stay with the current model?

Yes. The ##\chi^2/N## has a distribution that depends on N

(picture https://www.chem.purdue.edu/courses/chm621/text/stat/funcs/sampling/sampling.htm n = 3,5,10,20 shown)
With higher N it becomes sharper and more symmetric around 1. In other words: a deviation from 1 becomes more and more unlikely.

Read up a bit on that until you understand a phrase like :
The area under the reduced chi squared distribution, from the ##\chi^2_R## found, to ##\infty## is the probability you would find a higher ##\chi^2_R## if you would repeat the experiment.

Remember though, that this is statistics -- for an experimentalist the physics takes precedence.

Note to self: I omit a treatise on internal/external errors which may be essential for the ##\int_{\chi^2}^\infty## phrase

Merlin3189 · Mar 10, 2019

I would be interested to see the error bars on your data points.

FactChecker · Mar 11, 2019

Because the deviations from the linear model are so systematic, they do not look like random errors to me. The regression models should statistically support the inclusion of the non-linear term. IMHO, you should use the higher-order model. That being said, if the theory strongly suggests a linear relationship, then you should ask yourself if there may be something about your experiment or measurement methods that are introducing the non-linear term. Even if that is true, the best extrapolation of the entire experiment and measurement process is the non-linear model.

Extrapolating data points using models

Homework Statement

Homework Equations

The Attempt at a Solution

Attachments

Attachments

Attachments

"Critical" Triangle Problem

The optimal way of dividing the bet three ways

Hedging on a weather prediction

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Extrapolating data points using models

Homework Statement

Homework Equations

The Attempt at a Solution

Attachments

Attachments

Attachments

Similar threads