# Fitting functions based on imperfect data

• I
I have a set of values and I'm trying to come up with functions to fit that data.
Code:
x    v1    v2    v3
1    112    14.4    52
2    120    15    53
3    128    15.5    54
4    136    16.2    55
5    144    16.9    56
6    152    17.6    57
7    160    18.5    58
8    168    19.4    59
9    176    20.4    60
10    184    21.5    61

Here is what I know about the data:
It is rounded down / floored to the nearest significant digit (i.e. 1 for v1 and v3, 0.1 for v2).
Columns v1 and v3 look linear (e.g. first order polynomial).
Column v2 looks higher order(probably second, but not sure).
I think (but am not sure) that the input parameters are always 1 order of magnitude less than the results.

Now, for v1 and v3, this makes things fairly straightforward.
Our function is
y = ax + b
a = (vmax - vmin) / 9
b = vmin - a

Because the values are floor, a could potentially vary. The biggest possible difference will be within 1 of our values, so we have
a = (vmax - vmin) / 9 ±(1/9)

2 things give me trouble:
1. Column v2
2. A good way for making the function fit the data exactly without having to brute force it, i.e. mess around until I find a value for a and b which fit exactly.

Notes:
• This isn't a homework question.
• I don't have a strong mathematical background (I know the basics of calculus, but not enough for practical use). That said, not averse to trying to tackle complex subjects.

Svein
Excel has the necessary functions built in (slope and intercept).

And LINEST.

Also, do you remember that old Microsoft helicopter joke?

FactChecker
Gold Member
The truncation of numbers on average reduces the number by half of the next place. If a value in the table is 5.6, I would use the value 5.65. Always place 5 in the first truncated digit. You do not say if the x values are truncated or if they are exact. Deal with them appropriately.

Other than that, this looks like a standard linear regression problem. The Excel LINEST tool can be used. There are several other statistical packages with linear regression tools. IMHO, they would be much more acceptable and standard than the process that you suggest.

The truncation of numbers on average reduces the number by half of the next place. If a value in the table is 5.6, I would use the value 5.65. Always place 5 in the first truncated digit. You do not say if the x values are truncated or if they are exact. Deal with them appropriately.

Other than that, this looks like a standard linear regression problem. The Excel LINEST tool can be used. There are several other statistical packages with linear regression tools. IMHO, they would be much more acceptable and standard than the process that you suggest.
I get it already. Linear regression. Already using LINEST (seems to give more consistent results than my method).

On x: Exact integers from 1 to 10.

My problem isn't the linear relationship. There's tools, it has a name and a long wiki article.

My biggest problem is that one column that doesn't have a linear relationship.

FactChecker
Gold Member
You can try taking the log of the v2 values or including a second variable in the linear regression whose values are the x values squared. Either approach may give a better fit to the curved v2 data.

PS. I just looked at the plot of ##x^2## and [CORRECTION] I don't think it will be helpful. But log(v2) looks better. it looks good.

Last edited:
Better in what way? Sorry, as I said, I don't have a good math background.

mfb
Mentor
You can try taking the log of the v2 values or including a second variable in the linear regression whose values are the x values squared. Either approach may give a better fit to the curved v2 data.

PS. I just looked at the plot of ##x^2## and I don't think it will be helpful. But log(v2) looks better.
What is wrong with a parabola? It can fit to all numbers nicely within the rounding errors.

FactChecker
FactChecker
Gold Member
What is wrong with a parabola? It can fit to all numbers nicely within the rounding errors.
I stand corrected. I did the regression and I agree that it looks good. I guess that trying to "eyeball" it was a bad mistake.

Tom.G
Below curve fit for X vs V2 was done with SIECLATOR Ver. 4.1 The original hosting site is gone but a Google search seems to find some that I haven't tried.
As of April, 2009 the authors email was: Email: revelator.programs@hotmail.com

Coefficients of Fit
1.40133333333334e+01 . X ^0
3.97272727272719e-01 . X ^1
3.48484848484855e-02 . X ^2
# Ydata Ycalc. Diff.
1 1.44000e+01 1.44454e+01 4.5e-02
2 1.50000e+01 1.49472e+01 -5.2e-02
3 1.55000e+01 1.55187e+01 1.8e-02
4 1.62000e+01 1.61600e+01 -4.0e-02
5 1.69000e+01 1.68709e+01 -2.9e-02
6 1.76000e+01 1.76515e+01 5.1e-02
7 1.85000e+01 1.85018e+01 1.8e-03
8 1.94000e+01 1.94218e+01 2.1e-02
9 2.04000e+01 2.04115e+01 1.1e-02
10 2.15000e+01 2.14709e+01 -2.9e-02

Cheers,
Tom

FactChecker
FactChecker
Gold Member
Coefficients of Fit
1.40133333333334e+01 . X ^0
3.97272727272719e-01 . X ^1
3.48484848484855e-02 . X ^2
Good. My only recomended change would be to add 0.005 to the constant term. That will split the errors caused by the truncation of the data to two digits after the decimal point. If @martix plans to do much work like this, I recommend that he get a good statistical package. I like R, which is free and well established. R has a learning curve.

Below curve fit for X vs V2 was done with SIECLATOR Ver. 4.1 The original hosting site is gone but a Google search seems to find some that I haven't tried.
As of April, 2009 the authors email was: Email: revelator.programs@hotmail.com

Coefficients of Fit
1.40133333333334e+01 . X ^0
3.97272727272719e-01 . X ^1
3.48484848484855e-02 . X ^2
# Ydata Ycalc. Diff.
1 1.44000e+01 1.44454e+01 4.5e-02
2 1.50000e+01 1.49472e+01 -5.2e-02
3 1.55000e+01 1.55187e+01 1.8e-02
4 1.62000e+01 1.61600e+01 -4.0e-02
5 1.69000e+01 1.68709e+01 -2.9e-02
6 1.76000e+01 1.76515e+01 5.1e-02
7 1.85000e+01 1.85018e+01 1.8e-03
8 1.94000e+01 1.94218e+01 2.1e-02
9 2.04000e+01 2.04115e+01 1.1e-02
10 2.15000e+01 2.14709e+01 -2.9e-02

Cheers,
Tom
I'll take a look at the program, it will probably be useful in general. And I've been reading up on "least squares regression", which is what we're using here.
But yes, Wolfram Alpha reports the exact same result. And has the benefit of offering a neat visualisation and alternate forms.
As @FactChecker said, I need to add a bit to the constant, though I get best result with +0.05.
I just can't quite make it fit, there always remains a small difference on one or two datapoints....

FactChecker
Gold Member
though I get best result with +0.05.
I just can't quite make it fit, there always remains a small difference on one or two datapoints....
Do you mean 0.005? That is what I would expect. Since the amount that is truncated is only known to average 0.005, there is no way to be exact. If the "small diference" is less than 0.005, that is as good as you can hope for.

Do you mean 0.005? That is what I would expect. Since the amount that is truncated is only known to average 0.005, there is no way to be exact. If the "small diference" is less than 0.005, that is as good as you can hope for.
No. Look at the column (v2) - the values are truncated to the first decimal place, meaning the truncated amount averages to 0.05.
Otherwise you are correct.

And excel(LINEST) can do higher-order polynomials as well, as I found out here:
I'm having some difficulty deciphering the inner workings of excel, but it does work.

FactChecker
Gold Member
No. Look at the column (v2) - the values are truncated to the first decimal place, meaning the truncated amount averages to 0.05.
Oh, sorry. I must have been looking at some other data.
And excel(LINEST) can do higher-order polynomials as well, as I found out here:
I'm having some difficulty deciphering the inner workings of excel, but it does work.
Yes. Practically any statistical package can do multiple linear regression. Some hand-held calculators may not. I am not very familiar with the EXCEL LINEST, but it looks like it does what is called stepwise regression with backward elimination. (see https://support.office.com/en-us/article/linest-function-84d7d0d9-6e50-4101-977a-fa7abf772b6d and https://en.wikipedia.org/wiki/Stepwise_regression )

Tom.G
That SIECLATOR program originally ran with a 21-term polynomial fit! After looking at the plot and playing with the number of terms, I posted the 3-term result; seemed "close enough."

Cheers,
Tom

FactChecker
Gold Member
That SIECLATOR program originally ran with a 21-term polynomial fit!
That is kind of disturbing since there are only 10 data points.

mfb
Mentor
In addition to the data points it also fitted a cat.

All the different tools here give the same fit results for the parabola because I would expect all of them to use the same analytic formula.

FactChecker
FactChecker