Fitting functions based on imperfect data

martix · Nov 11, 2018

I have a set of values and I'm trying to come up with functions to fit that data.

Code:

x    v1    v2    v3
1    112    14.4    52
2    120    15    53
3    128    15.5    54
4    136    16.2    55
5    144    16.9    56
6    152    17.6    57
7    160    18.5    58
8    168    19.4    59
9    176    20.4    60
10    184    21.5    61

Here is what I know about the data:
It is rounded down / floored to the nearest significant digit (i.e. 1 for v1 and v3, 0.1 for v2).
Columns v1 and v3 look linear (e.g. first order polynomial).
Column v2 looks higher order(probably second, but not sure).
I think (but am not sure) that the input parameters are always 1 order of magnitude less than the results.

Now, for v1 and v3, this makes things fairly straightforward.
Our function is
y = ax + b
a = (v_max - v_min) / 9
b = v_min - a

Because the values are floor, a could potentially vary. The biggest possible difference will be within 1 of our values, so we have
a = (v_max - v_min) / 9 ±(1/9)

2 things give me trouble:
1. Column v2
2. A good way for making the function fit the data exactly without having to brute force it, i.e. mess around until I find a value for a and b which fit exactly.

Notes:

This isn't a homework question.
I don't have a strong mathematical background (I know the basics of calculus, but not enough for practical use). That said, not averse to trying to tackle complex subjects.

Svein · Nov 11, 2018

Excel has the necessary functions built in (slope and intercept).

martix · Nov 11, 2018

And LINEST.

Also, do you remember that old Microsoft helicopter joke?

Technically true, and decidedly unhelpful.

FactChecker · Nov 11, 2018

The truncation of numbers on average reduces the number by half of the next place. If a value in the table is 5.6, I would use the value 5.65. Always place 5 in the first truncated digit. You do not say if the x values are truncated or if they are exact. Deal with them appropriately.

Other than that, this looks like a standard linear regression problem. The Excel LINEST tool can be used. There are several other statistical packages with linear regression tools. IMHO, they would be much more acceptable and standard than the process that you suggest.

martix · Nov 11, 2018

FactChecker said:

The truncation of numbers on average reduces the number by half of the next place. If a value in the table is 5.6, I would use the value 5.65. Always place 5 in the first truncated digit. You do not say if the x values are truncated or if they are exact. Deal with them appropriately.

Other than that, this looks like a standard linear regression problem. The Excel LINEST tool can be used. There are several other statistical packages with linear regression tools. IMHO, they would be much more acceptable and standard than the process that you suggest.

I get it already. Linear regression. Already using LINEST (seems to give more consistent results than my method).

On x: Exact integers from 1 to 10.

My problem isn't the linear relationship. There's tools, it has a name and a long wiki article.

My biggest problem is that one column that doesn't have a linear relationship.

FactChecker · Nov 11, 2018

You can try taking the log of the v2 values or including a second variable in the linear regression whose values are the x values squared. Either approach may give a better fit to the curved v2 data.

PS. I just looked at the plot of ##x^2## and [CORRECTION] ~~I don't think it will be helpful. But log(v2) looks better.~~ it looks good.

martix · Nov 11, 2018

Better in what way? Sorry, as I said, I don't have a good math background.

mfb · Nov 11, 2018

FactChecker said:

You can try taking the log of the v2 values or including a second variable in the linear regression whose values are the x values squared. Either approach may give a better fit to the curved v2 data.

PS. I just looked at the plot of ##x^2## and I don't think it will be helpful. But log(v2) looks better.

What is wrong with a parabola? It can fit to all numbers nicely within the rounding errors.

FactChecker · Nov 11, 2018

mfb said:

What is wrong with a parabola? It can fit to all numbers nicely within the rounding errors.

I stand corrected. I did the regression and I agree that it looks good. I guess that trying to "eyeball" it was a bad mistake.

Tom.G · Nov 12, 2018

Below curve fit for X vs V2 was done with SIECLATOR Ver. 4.1 The original hosting site is gone but a Google search seems to find some that I haven't tried.
As of April, 2009 the authors email was: Email: revelator.programs@hotmail.com

Coefficients of Fit
1.40133333333334e+01 . X ^0
3.97272727272719e-01 . X ^1
3.48484848484855e-02 . X ^2
# Ydata Ycalc. Diff.
1 1.44000e+01 1.44454e+01 4.5e-02
2 1.50000e+01 1.49472e+01 -5.2e-02
3 1.55000e+01 1.55187e+01 1.8e-02
4 1.62000e+01 1.61600e+01 -4.0e-02
5 1.69000e+01 1.68709e+01 -2.9e-02
6 1.76000e+01 1.76515e+01 5.1e-02
7 1.85000e+01 1.85018e+01 1.8e-03
8 1.94000e+01 1.94218e+01 2.1e-02
9 2.04000e+01 2.04115e+01 1.1e-02
10 2.15000e+01 2.14709e+01 -2.9e-02

Cheers,
Tom

FactChecker · Nov 12, 2018

Tom.G said:

Coefficients of Fit
1.40133333333334e+01 . X ^0
3.97272727272719e-01 . X ^1
3.48484848484855e-02 . X ^2

Good. My only recomended change would be to add 0.005 to the constant term. That will split the errors caused by the truncation of the data to two digits after the decimal point. If @martix plans to do much work like this, I recommend that he get a good statistical package. I like R, which is free and well established. R has a learning curve.

martix · Nov 12, 2018

Tom.G said:

Below curve fit for X vs V2 was done with SIECLATOR Ver. 4.1 The original hosting site is gone but a Google search seems to find some that I haven't tried.
As of April, 2009 the authors email was: Email: revelator.programs@hotmail.com

Coefficients of Fit
1.40133333333334e+01 . X ^0
3.97272727272719e-01 . X ^1
3.48484848484855e-02 . X ^2
# Ydata Ycalc. Diff.
1 1.44000e+01 1.44454e+01 4.5e-02
2 1.50000e+01 1.49472e+01 -5.2e-02
3 1.55000e+01 1.55187e+01 1.8e-02
4 1.62000e+01 1.61600e+01 -4.0e-02
5 1.69000e+01 1.68709e+01 -2.9e-02
6 1.76000e+01 1.76515e+01 5.1e-02
7 1.85000e+01 1.85018e+01 1.8e-03
8 1.94000e+01 1.94218e+01 2.1e-02
9 2.04000e+01 2.04115e+01 1.1e-02
10 2.15000e+01 2.14709e+01 -2.9e-02

Cheers,
Tom

I'll take a look at the program, it will probably be useful in general. And I've been reading up on "least squares regression", which is what we're using here.
But yes, Wolfram Alpha reports the exact same result. And has the benefit of offering a neat visualisation and alternate forms.
As @FactChecker said, I need to add a bit to the constant, though I get best result with +0.05.
I just can't quite make it fit, there always remains a small difference on one or two datapoints...

FactChecker · Nov 12, 2018

martix said:

though I get best result with +0.05.
I just can't quite make it fit, there always remains a small difference on one or two datapoints...

Do you mean 0.005? That is what I would expect. Since the amount that is truncated is only known to average 0.005, there is no way to be exact. If the "small diference" is less than 0.005, that is as good as you can hope for.

martix · Nov 12, 2018

FactChecker said:

Do you mean 0.005? That is what I would expect. Since the amount that is truncated is only known to average 0.005, there is no way to be exact. If the "small diference" is less than 0.005, that is as good as you can hope for.

No. Look at the column (v2) - the values are truncated to the first decimal place, meaning the truncated amount averages to 0.05.
Otherwise you are correct.

And excel(LINEST) can do higher-order polynomials as well, as I found out here:
https://stackoverflow.com/questions/10857252/quadratic-and-cubic-regression-in-excel
I'm having some difficulty deciphering the inner workings of excel, but it does work.

FactChecker · Nov 12, 2018

martix said:

No. Look at the column (v2) - the values are truncated to the first decimal place, meaning the truncated amount averages to 0.05.

Oh, sorry. I must have been looking at some other data.

And excel(LINEST) can do higher-order polynomials as well, as I found out here:
https://stackoverflow.com/questions/10857252/quadratic-and-cubic-regression-in-excel
I'm having some difficulty deciphering the inner workings of excel, but it does work.

Yes. Practically any statistical package can do multiple linear regression. Some hand-held calculators may not. I am not very familiar with the EXCEL LINEST, but it looks like it does what is called stepwise regression with backward elimination. (see https://support.office.com/en-us/article/linest-function-84d7d0d9-6e50-4101-977a-fa7abf772b6d and https://en.wikipedia.org/wiki/Stepwise_regression )

Tom.G · Nov 12, 2018

That SIECLATOR program originally ran with a 21-term polynomial fit! After looking at the plot and playing with the number of terms, I posted the 3-term result; seemed "close enough." :oldeyes:

Cheers,
Tom

FactChecker · Nov 13, 2018

Tom.G said:

That SIECLATOR program originally ran with a 21-term polynomial fit!

That is kind of disturbing since there are only 10 data points.

mfb · Nov 13, 2018

In addition to the data points it also fitted a cat.

All the different tools here give the same fit results for the parabola because I would expect all of them to use the same analytic formula.

FactChecker · Nov 13, 2018

Ha! Funny cat data fit.

I agree that the different tools would give the same results for the parabola with the same independent variables. But if they use the different criteria and methods to determine which independent variables to include, the results may be different. SIECLATOR must be very different if it considered a polynomial of degree 20.

Tom.G · Nov 13, 2018

That 21 term fit in SIECLATOR was probably a leftover from the last time it was used. The documentation states the default as 4 terms; much more reasonable in the real world!

Fitting functions based on imperfect data

Graduate Expected numbers of cards of a last color remaining

Undergrad The problem of points

Graduate Probability puzzle

Undergrad The countability paradox of computable numbers

Undergrad How does axiom of foundation prevent infinite sequence of elements?

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Fitting functions based on imperfect data

Similar threads