What would happen if you try to fit 'n' degree polynomial to (n+1) data points?

whuzzwhuzz · Jul 2, 2012

I couldn't get these lines from my book. I will reproduce it here.

Warning: In step 1, if you use computer to fit a polynomial to the data , it could lead to disaster. For example, consider fitting a sixth degree polynomial to the seven data points, or, an (n-1) degree polynomial to n points.

Could you please explain it to me?
Thanks a lot!

D H · Jul 2, 2012

Two points are needed to define a line, three for a parabola, and so on. In general, n+1 points are needed to define a degree n polynomial (or n points are needed to define a degree n-1 polynomial, same thing).

Example: Suppose you collect 71 measured data points from a relationship that is known to be quadratic. The measurements are inherently noisy. So what happens if you try to fit those 71 points to a 70th degree polynomial instead of a quadratic like you should have? You will get an exact fit at those 71 data points. Outside the range of those points this polynomial will zoom off to infinity faster than fast. It's a 70th degree polynomial, and x⁷⁰ grows very, very quickly. So that's one disaster: The forced fit has zero extrapolation power. Now look what happens between your data points. This 70th degree polynomial will have 69 extrema. Most of these extrema will be between your sampled data. The noise in the data is inevitable going to make those extreme values be very, very large. The fit has zero interpolative power.

This latter problem is an even bigger disaster than the lack of extrapolative power. You always take your chances when extrapolating outside the domain of applicability of some model. On the other hand, a reasonable model should have reasonable interpolative capabilities. An overfit model has neither extrapolative nor interpolative capabilities. In short, it's worthless.

jbriggs444 · Jul 2, 2012

It is not simply that x^70 grows very rapidly. If all of your data points were for values of x between -0.5 and +0.5 then x^70 would be quite small.

But if the values of x^n in your polynomial are very small, that means that the coefficients on those terms will have to be very large in order to exactly fit a large number of noisy measurements.

Either way -- whether due to huge values of the x^n terms or huge values of the coefficients on those terms, the graph of the so-called "interpolating polynomial" will tend to become more and more pathological and less and less smooth as the number of fitted points increases, just as D H has indicated.

D H · Jul 2, 2012

jbriggs444 said:

It is not simply that x^70 grows very rapidly. If all of your data points were for values of x between -0.5 and +0.5 then x^70 would be quite small.

If the only possible values for x are between -0.5 and +0.5, then yes, extrapolation is not a problem for the simple reason that there is no extrapolation. On the other hand, if measurements are restricted to (-0.5, 0.5), but this is but a small part of the overall domain, the growth in x^70 for |x|>1 is a huge problem with regard to extrapolation. The region between ±(0.5, 1.0) is also going to be problematic. Extrapolation is completely shot with an overfit curve.

Shoot, extrapolation is oftentimes suspect even if the model is correct. It's the failure of interpolative power that's the real killer. That and the fact that a 70th degree polynomial almost certainly is not the right model. For example, if the true model is a sinusoid, using a polynomial model is just wrong.

whuzzwhuzz · Jul 2, 2012

Thanks a lot D H and jbriggs! :) I got it very clear. We just shouldn't try to fit n degree polynomial to n+1 data points to get a smooth curve and difficulty in interpolation.

But jbriggs, imho there shouldn't be any problem with huge values of coefficients if I find that from graph. Or, will there be a problem?

Also D H, why there will be a problem in the region ±(0.5, 1.0)?

Thanks again you guys! :)

D H · Jul 2, 2012

whuzzwhuzz said:

Thanks a lot D H and jbriggs! :) I got it very clear. We just shouldn't try to fit n degree polynomial to n+1 data points to get a smooth curve and difficulty in interpolation.

An overfit curve will be smooth mathematically for the simple reason that every polynomial is smooth. It will not be smooth to the eye.

Here's an example from http://pmtk3.googlecode.com/svn/trunk/docs/tutorial/html/tutRegr.html .

Suppose you have a set of 21 data samples, and the underlying problem is quadratic in nature. Here's that dat fit to a quadratic:
http://pmtk3.googlecode.com/svn/trunk/docs/tutorial/html/tutRegr_04.png
Nice and smooth. Not quite perfect, but a fairly good fit and besides there's no reason to expect perfection.

Here's the same data fit to a 20th order polynomial:
http://pmtk3.googlecode.com/svn/trunk/docs/tutorial/html/tutRegr_06.png
It's too bad the plot just shows the fit going off scale. The fit is "perfect" in the sense that it hits every data point. It is far, far from perfect in any other sense. And it isn't "smooth".

Also D H, why there will be a problem in the region ±(0.5, 1.0)?

Those coefficients will be *huge* with an exact fit. Each polynomial coefficient has equal weight at x= ±1. The odds are that the sum (+1) / alternating sum (-1) of the coefficients will be huge as well. The fit is fantastically off at ±1. The error is going to decrease as one moves toward ±0.5 but that's just because the fit is spot on at ±0.5.

A perhaps crude visualization: Suppose you have a friend who's a bit overweight. This person is fun to be with; that's what counts. Nonetheless, do you want to see this person in an ultra-tight spandex outfit in which the shirt and pants leave a gap at the waist for things to bulge out? No! TMI!

Overfitting is TMI.

jbriggs444 · Jul 2, 2012

whuzzwhuzz said:

But jbriggs, imho there shouldn't be any problem with huge values of coefficients if I find that from graph. Or, will there be a problem?

I may not understand correctly what you mean by finding values of coefficients from a graph. So let me take a guess...

Suppose that you had an unknown function and a perfectly accurate graph. You picked 71 points on this graph and figured out the interpolating polynomial (of degree 70) that exactly fitted those points.

If the original function were actually a polynomial of degree 70 or less then you are correct -- the coefficients would not get particularly huge. The interpolating polynomial that you came up with would be identical to the original function. It would have exactly the same coefficients.

But if the original function were a non-polynomial function such as sin(x) then you would be likely to find that the coefficients on your interpolating polynomial would need to be quite large and that their values would vary wildly depending on exactly what points you chose to fit.

Well, let's try it out... let's find the degree 70 interpolating polynomial for sin(x) at x = 0.01, 0.02, 0.03... up through x = 0.70

... took me a while to write the code and verify the result ...

Degree 70 interpolating polynomial is 1.78565320710274e+041x^70 + -4.48494832583492e+042x^69 + 5.528038125205e
+043x^68 + -4.45584495836681e+044x^67 + 2.64093918402752e+045x^66 + -1.22707558670997e+046x^65 + 4.65368376059
699e+046x^64 + -1.48108190165085e+047x^63 + 4.036364463932e+047x^62 + -9.56519374605964e+047x^61 + 1.994858229
11897e+048x^60 + -3.69692156624871e+048x^59 + 6.13640239194275e+048x^58 + -9.18312285681255e+048x^57 + 1.24588
258279778e+049x^56 + -1.53965291085099e+049x^55 + 1.74014763648891e+049x^54 + -1.80503906388373e+049x^53 + 1.7
236324441617e+049x^52 + -1.51918970809288e+049x^51 + 1.23878579464513e+049x^50 + -9.36440424303407e+048x^49 +
6.57412172779752e+048x^48 + -4.29286045569417e+048x^47 + 2.61095511730387e+048x^46 + -1.48084919372112e+048x^4
5 + 7.84018686366066e+047x^44 + -3.87817675552892e+047x^43 + 1.79364634770811e+047x^42 + -7.76106818613555e+04
6x^41 + 3.14338341699398e+046x^40 + -1.19215036135871e+046x^39 + 4.23489040039878e+045x^38 + -1.40930501591765
e+045x^37 + 4.39389222302018e+044x^36 + -1.28339880860131e+044x^35 + 3.51143434657881e+043x^34 + -8.9974094952
8726e+042x^33 + 2.15831918045861e+042x^32 + -4.84494984318825e+041x^31 + 1.01718994248303e+041x^30 + -1.996040
65320896e+040x^29 + 3.65810310650505e+039x^28 + -6.25565439758534e+038x^27 + 9.97179926971518e+037x^26 + -1.47
996135224111e+037x^25 + 2.04234220400722e+036x^24 + -2.61673679689642e+035x^23 + 3.10753813341245e+034x^22 + -
3.41412767675128e+033x^21 + 3.46283385990336e+032x^20 + -3.23475069542121e+031x^19 + 2.77553881949394e+030x^18
+ -2.18094038032037e+029x^17 + 1.56405682945229e+028x^16 + -1.01975758414449e+027x^15 + 6.01819398976705e+025
x^14 + -3.19867562980926e+024x^13 + 1.52223607779542e+023x^12 + -6.44261194784544e+021x^11 + 2.40578287779307e
+020x^10 + -7.85160597893375e+018x^9 + 2.21417167463036e+017x^8 + -5.32010164379294e+015x^7 + 107006949846776x
^6 + -1760883078610.78x^5 + 22983570475.5373x^4 + -227625047.555063x^3 + 1596470.25219974x^2 + -7005.641064764
07x^1 + 14.3140976948127

What would happen if you try to fit 'n' degree polynomial to (n+1) data points?

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Similar threads

Undergrad Find the Number of Triangles

High School Area of Overlapping Squares

High School Potato paradox

High School Ant on a stretchy rope puzzle

Geometric Game: Fun With Matches (Safe!)

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight