What would happen if you try to fit 'n' degree polynomial to (n+1) data points?

In summary: Suppose further that you know the function at exactly n+1 points, and you know those n+1 points "exactly" (e.g., they have no measurement error). Then you can use Lagrange interpolation to find the unique degree n polynomial which fits those data points.As you noticed, as n gets larger, these polynomials can get very large very quickly. So even if your data points are not "noisy" your polynomial can be huge.Now suppose you did this for a very large n, so that your polynomial is huge. Then you take your Lagrange polynomial and you plot it. So you plot a
  • #1
whuzzwhuzz
6
0
I couldn't get these lines from my book. I will reproduce it here.

Warning: In step 1, if you use computer to fit a polynomial to the data , it could lead to disaster. For example, consider fitting a sixth degree polynomial to the seven data points, or, an (n-1) degree polynomial to n points.

Could you please explain it to me?
Thanks a lot!
 
Last edited:
Mathematics news on Phys.org
  • #2
Two points are needed to define a line, three for a parabola, and so on. In general, n+1 points are needed to define a degree n polynomial (or n points are needed to define a degree n-1 polynomial, same thing).

Example: Suppose you collect 71 measured data points from a relationship that is known to be quadratic. The measurements are inherently noisy. So what happens if you try to fit those 71 points to a 70th degree polynomial instead of a quadratic like you should have? You will get an exact fit at those 71 data points. Outside the range of those points this polynomial will zoom off to infinity faster than fast. It's a 70th degree polynomial, and x70 grows very, very quickly. So that's one disaster: The forced fit has zero extrapolation power. Now look what happens between your data points. This 70th degree polynomial will have 69 extrema. Most of these extrema will be between your sampled data. The noise in the data is inevitable going to make those extreme values be very, very large. The fit has zero interpolative power.

This latter problem is an even bigger disaster than the lack of extrapolative power. You always take your chances when extrapolating outside the domain of applicability of some model. On the other hand, a reasonable model should have reasonable interpolative capabilities. An overfit model has neither extrapolative nor interpolative capabilities. In short, it's worthless.
 
  • #3
It is not simply that x^70 grows very rapidly. If all of your data points were for values of x between -0.5 and +0.5 then x^70 would be quite small.

But if the values of x^n in your polynomial are very small, that means that the coefficients on those terms will have to be very large in order to exactly fit a large number of noisy measurements.

Either way -- whether due to huge values of the x^n terms or huge values of the coefficients on those terms, the graph of the so-called "interpolating polynomial" will tend to become more and more pathological and less and less smooth as the number of fitted points increases, just as D H has indicated.
 
  • #4
jbriggs444 said:
It is not simply that x^70 grows very rapidly. If all of your data points were for values of x between -0.5 and +0.5 then x^70 would be quite small.
If the only possible values for x are between -0.5 and +0.5, then yes, extrapolation is not a problem for the simple reason that there is no extrapolation. On the other hand, if measurements are restricted to (-0.5, 0.5), but this is but a small part of the overall domain, the growth in x^70 for |x|>1 is a huge problem with regard to extrapolation. The region between ±(0.5, 1.0) is also going to be problematic. Extrapolation is completely shot with an overfit curve.

Shoot, extrapolation is oftentimes suspect even if the model is correct. It's the failure of interpolative power that's the real killer. That and the fact that a 70th degree polynomial almost certainly is not the right model. For example, if the true model is a sinusoid, using a polynomial model is just wrong.
 
  • #5
Thanks a lot D H and jbriggs! :) I got it very clear. We just shouldn't try to fit n degree polynomial to n+1 data points to get a smooth curve and difficulty in interpolation.

But jbriggs, imho there shouldn't be any problem with huge values of coefficients if I find that from graph. Or, will there be a problem?

Also D H, why there will be a problem in the region ±(0.5, 1.0)?

Thanks again you guys! :)
 
  • #6
whuzzwhuzz said:
Thanks a lot D H and jbriggs! :) I got it very clear. We just shouldn't try to fit n degree polynomial to n+1 data points to get a smooth curve and difficulty in interpolation.
An overfit curve will be smooth mathematically for the simple reason that every polynomial is smooth. It will not be smooth to the eye.

Here's an example from http://pmtk3.googlecode.com/svn/trunk/docs/tutorial/html/tutRegr.html .

Suppose you have a set of 21 data samples, and the underlying problem is quadratic in nature. Here's that dat fit to a quadratic:
http://pmtk3.googlecode.com/svn/trunk/docs/tutorial/html/tutRegr_04.png
Nice and smooth. Not quite perfect, but a fairly good fit and besides there's no reason to expect perfection.


Here's the same data fit to a 20th order polynomial:
http://pmtk3.googlecode.com/svn/trunk/docs/tutorial/html/tutRegr_06.png
It's too bad the plot just shows the fit going off scale. The fit is "perfect" in the sense that it hits every data point. It is far, far from perfect in any other sense. And it isn't "smooth".


Also D H, why there will be a problem in the region ±(0.5, 1.0)?
Those coefficients will be *huge* with an exact fit. Each polynomial coefficient has equal weight at x= ±1. The odds are that the sum (+1) / alternating sum (-1) of the coefficients will be huge as well. The fit is fantastically off at ±1. The error is going to decrease as one moves toward ±0.5 but that's just because the fit is spot on at ±0.5.



A perhaps crude visualization: Suppose you have a friend who's a bit overweight. This person is fun to be with; that's what counts. Nonetheless, do you want to see this person in an ultra-tight spandex outfit in which the shirt and pants leave a gap at the waist for things to bulge out? No! TMI!

Overfitting is TMI.
 
Last edited by a moderator:
  • #7
whuzzwhuzz said:
But jbriggs, imho there shouldn't be any problem with huge values of coefficients if I find that from graph. Or, will there be a problem?

I may not understand correctly what you mean by finding values of coefficients from a graph. So let me take a guess...

Suppose that you had an unknown function and a perfectly accurate graph. You picked 71 points on this graph and figured out the interpolating polynomial (of degree 70) that exactly fitted those points.

If the original function were actually a polynomial of degree 70 or less then you are correct -- the coefficients would not get particularly huge. The interpolating polynomial that you came up with would be identical to the original function. It would have exactly the same coefficients.

But if the original function were a non-polynomial function such as sin(x) then you would be likely to find that the coefficients on your interpolating polynomial would need to be quite large and that their values would vary wildly depending on exactly what points you chose to fit.

Well, let's try it out... let's find the degree 70 interpolating polynomial for sin(x) at x = 0.01, 0.02, 0.03... up through x = 0.70

... took me a while to write the code and verify the result ...

Degree 70 interpolating polynomial is 1.78565320710274e+041x^70 + -4.48494832583492e+042x^69 + 5.528038125205e
+043x^68 + -4.45584495836681e+044x^67 + 2.64093918402752e+045x^66 + -1.22707558670997e+046x^65 + 4.65368376059
699e+046x^64 + -1.48108190165085e+047x^63 + 4.036364463932e+047x^62 + -9.56519374605964e+047x^61 + 1.994858229
11897e+048x^60 + -3.69692156624871e+048x^59 + 6.13640239194275e+048x^58 + -9.18312285681255e+048x^57 + 1.24588
258279778e+049x^56 + -1.53965291085099e+049x^55 + 1.74014763648891e+049x^54 + -1.80503906388373e+049x^53 + 1.7
236324441617e+049x^52 + -1.51918970809288e+049x^51 + 1.23878579464513e+049x^50 + -9.36440424303407e+048x^49 +
6.57412172779752e+048x^48 + -4.29286045569417e+048x^47 + 2.61095511730387e+048x^46 + -1.48084919372112e+048x^4
5 + 7.84018686366066e+047x^44 + -3.87817675552892e+047x^43 + 1.79364634770811e+047x^42 + -7.76106818613555e+04
6x^41 + 3.14338341699398e+046x^40 + -1.19215036135871e+046x^39 + 4.23489040039878e+045x^38 + -1.40930501591765
e+045x^37 + 4.39389222302018e+044x^36 + -1.28339880860131e+044x^35 + 3.51143434657881e+043x^34 + -8.9974094952
8726e+042x^33 + 2.15831918045861e+042x^32 + -4.84494984318825e+041x^31 + 1.01718994248303e+041x^30 + -1.996040
65320896e+040x^29 + 3.65810310650505e+039x^28 + -6.25565439758534e+038x^27 + 9.97179926971518e+037x^26 + -1.47
996135224111e+037x^25 + 2.04234220400722e+036x^24 + -2.61673679689642e+035x^23 + 3.10753813341245e+034x^22 + -
3.41412767675128e+033x^21 + 3.46283385990336e+032x^20 + -3.23475069542121e+031x^19 + 2.77553881949394e+030x^18
+ -2.18094038032037e+029x^17 + 1.56405682945229e+028x^16 + -1.01975758414449e+027x^15 + 6.01819398976705e+025
x^14 + -3.19867562980926e+024x^13 + 1.52223607779542e+023x^12 + -6.44261194784544e+021x^11 + 2.40578287779307e
+020x^10 + -7.85160597893375e+018x^9 + 2.21417167463036e+017x^8 + -5.32010164379294e+015x^7 + 107006949846776x
^6 + -1760883078610.78x^5 + 22983570475.5373x^4 + -227625047.555063x^3 + 1596470.25219974x^2 + -7005.641064764
07x^1 + 14.3140976948127
 

1. What is the significance of fitting a degree 'n' polynomial to (n+1) data points?

Fitting a degree 'n' polynomial to (n+1) data points is a common practice in regression analysis. It involves finding the best curve that fits the data points, which can help in predicting future values and understanding the relationship between the variables.

2. Can a degree 'n' polynomial perfectly fit (n+1) data points?

No, it is not possible for a degree 'n' polynomial to perfectly fit (n+1) data points. This is because the number of data points is always one more than the degree of the polynomial. Therefore, there will always be some error or residual between the actual data points and the fitted curve.

3. What happens if the degree of the polynomial is less than (n+1)?

If the degree of the polynomial is less than (n+1), it means that the model is underfitting the data. This can result in high bias and low variance, leading to poor predictions. It is important to choose an appropriate degree for the polynomial to avoid underfitting.

4. Is it possible to overfit the data when using a degree 'n' polynomial?

Yes, it is possible to overfit the data when using a degree 'n' polynomial. This can happen when the model is too complex and tries to fit the noise and outliers in the data, resulting in low bias and high variance. Regularization techniques can be used to prevent overfitting.

5. What is the difference between using a higher degree polynomial and a lower degree polynomial to fit the same data?

A higher degree polynomial may provide a better fit to the data points, resulting in lower error and higher accuracy. However, it can also lead to overfitting, as mentioned earlier. On the other hand, a lower degree polynomial may have higher bias but can result in a more generalizable and stable model. The choice between the two depends on the specific data and the goals of the analysis.

Similar threads

Replies
2
Views
1K
  • Programming and Computer Science
Replies
7
Views
1K
Replies
6
Views
1K
  • Linear and Abstract Algebra
Replies
6
Views
1K
Replies
1
Views
923
  • General Math
Replies
3
Views
4K
Replies
3
Views
11K
Replies
1
Views
3K
  • General Math
Replies
1
Views
1K
Replies
2
Views
2K
Back
Top