Solving polynomial coefficients to minimize square error

tomizzo · May 19, 2016

Hi there,

I'm working on a problem right now that relates to least squares error estimate for polynomial fitting.

I'm aware of techniques (iterative formulas) for finding the coefficients of a polynomial that minimizes the square error from a data set. So for example, for a data set that I believe is a first order polynomial, y(x) = a0 + a1*x, I have a method for finding the coefficients of a0 and a1.

However, what if I were to make an assumption about one of the coefficients, specifically, let's says I want a0 = 0. The question is then, let's find y(x) = a1*x, such that a1 minimizes the square error.

How would I go about solving this? The only method I know attempts to solve for a0 and a1, yet I'm making an assumption about a0 = 0, and am attempting to find a1.

Does anyone know how to solve this problem? Or if there is a general solution to solve this? I believe the problem is really a calculus of variation problems, and I simply do not know the math.

TLDR - What is the method for solving for polynomial coefficients (in a least squares sense), while being able to make assumptions about certain parameters.

Mark44 · May 19, 2016

tomizzo said:

Hi there,

I'm working on a problem right now that relates to least squares error estimate for polynomial fitting.

I'm aware of techniques (iterative formulas) for finding the coefficients of a polynomial that minimizes the square error from a data set. So for example, for a data set that I believe is a first order polynomial, y(x) = a0 + a1*x, I have a method for finding the coefficients of a0 and a1.

However, what if I were to make an assumption about one of the coefficients, specifically, let's says I want a0 = 0. The question is then, let's find y(x) = a1*x, such that a1 minimizes the square error.

How would I go about solving this? The only method I know attempts to solve for a0 and a1, yet I'm making an assumption about a0 = 0, and am attempting to find a1.

But is this a reasonable assumption (i.e., that a0 = 0)? If you had a set of data for which you could find the line of best fit y(x) = A + Bx, then it seems to me that the line y(x) = kx would not be a good fit. Or are you trying to find the line whose fit is "least bad"?

tomizzo said:

Does anyone know how to solve this problem? Or if there is a general solution to solve this? I believe the problem is really a calculus of variation problems, and I simply do not know the math.

TLDR - What is the method for solving for polynomial coefficients (in a least squares sense), while being able to make assumptions about certain parameters.

blue_leaf77 · May 19, 2016

The it becomes a one dimensional problem ##y-a_0 = a_1 x##, but this is not a general form of a line.

tomizzo · May 20, 2016

Mark44 said:

But is this a reasonable assumption (i.e., that a0 = 0)? If you had a set of data for which you could find the line of best fit y(x) = A + Bx, then it seems to me that the line y(x) = kx would not be a good fit. Or are you trying to find the line whose fit is "least bad"?

Hi Mark,

Thanks for the response. So I'm definitely not an expert in this area, but here's what I know (or that I believe I know).

You are correct in the sense that I could try to fit y(x) = A + Bx and solve for A and B. The resulting function I calculate would indeed better fit the collected data, however, it would not better fit the underlying true signal if I believe that signal to take on the form y(x) = Bx. Theoretically, I could try to fit an infinitely degree polynomial and would indeed reduce the square error, but it wouldn't fit the actual underly signal any better.

So in my case, I believe that the true underlying signal is of the form y(x) = Bx. Thus I should constraint the function as much as I can... But yeah, you could also say I'm trying to find a line that is 'least bad'.

So this question originally stemmed a from a specific instance, however, I'm now curious about the general case in which you can add constraints. I'm also curious if this a problem where Lagrange multipliers may be of use...

Mark44 · May 20, 2016

tomizzo said:

Hi Mark,

Thanks for the response. So I'm definitely not an expert in this area, but here's what I know (or that I believe I know).

You are correct in the sense that I could try to fit y(x) = A + Bx and solve for A and B. The resulting function I calculate would indeed better fit the collected data, however, it would not better fit the underlying true signal if I believe that signal to take on the form y(x) = Bx. Theoretically, I could try to fit an infinitely degree polynomial and would indeed reduce the square error, but it wouldn't fit the actual underly signal any better.

So in my case, I believe that the true underlying signal is of the form y(x) = Bx. Thus I should constraint the function as much as I can... But yeah, you could also say I'm trying to find a line that is 'least bad'.

If, as you said above, y = A + Bx "would indeed better fit the collected data" then why not use this line? If the data actually does lie close to the line y = Bx, then the regression line calculation should show that A is zero or close to it. I don't see the need for forcing A to be zero a priori.

tomizzo said:

So this question originally stemmed a from a specific instance, however, I'm now curious about the general case in which you can add constraints. I'm also curious if this a problem where Lagrange multipliers may be of use...

I don't know. It's been too many years since I've done anything with these, so I can't say. I'm not even sure they are applicable here.

tomizzo · May 20, 2016

Mark44 said:

If, as you said above, y = A + Bx "would indeed better fit the collected data" then why not use this line? If the data actually does lie close to the line y = Bx, then the regression line calculation should show that A is zero or close to it. I don't see the need for forcing A to be zero a priori.
I don't know. It's been too many years since I've done anything with these, so I can't say. I'm not even sure they are applicable here.

Yes it would better fit the collected data. However, it would not be a better approximation of the true underlying signal if I know that the signal has the form y = Bx. By trying to use a model of y = A + Bx, my coefficient approximations would be more influenced by noise in the data set.

Stephen Tashi · May 20, 2016

Mark44 said:

I don't see the need for forcing A to be zero a priori.

I think requiring A = 0 is related to implementing the Dickey-Fuller test. See the thread: https://www.physicsforums.com/threads/how-do-you-implement-the-dickey-fuller-test.871777/

Stephen Tashi · May 20, 2016

tomizzo said:

Does anyone know how to solve this problem? Or if there is a general solution to solve this? I believe the problem is really a calculus of variation problems, and I simply do not know the math.

Least squares fitting of a polynomial to data is the type of problem where you have a function f(A,B,C...) of several variables and you find the extrema of such a function by creating a system of linear equations. Each equation comes from setting the partial derivative of f with respect to one of its arguments equal to zero.

In the case of data ## (x_i, y_i)## and ## f(A) = \sum_{i=1}^n (y_i - Ax_i)^2 ## we have the single equation:

## 0 = \frac{\partial f}{\partial A} = \sum_{i=1}^n (\ (2)(y_i - Ax_i)(x_i) \ ) ##

= ## \sum_{i=1}^n ( 2 x_i y_i - 2A {x_i}^2 ) = 2 \sum_{i=1}^n x_i y_i - 2A \sum_{i=1}^n {x_i}^2 ##

so ## A = \frac{\sum_{i=1}^n x_i y_i}{ \sum_{i=1}^n {x_i}^2} ## is an extremum of ## f ##.

TLDR - What is the method for solving for polynomial coefficients (in a least squares sense), while being able to make assumptions about certain parameters.

It depends on the assumptions, but if you want to assume particular coefficients take particular constant values (such as zero) you don't compute any partial derivatives with respect to those coefficients since they aren't variables.

tomizzo · May 20, 2016

Stephen Tashi said:

Least squares fitting of a polynomial to data is the type of problem where you have a function f(A,B,C...) of several variables and you find the extrema of such a function by creating a system of linear equations. Each equation comes from setting the partial derivative of f with respect to one of its arguments equal to zero.

In the case of data ## (x_i, y_i)## and ## f(A) = \sum_{i=1}^n (y_i - Ax_i)^2 ## we have the single equation:

## 0 = \frac{\partial f}{\partial A} = \sum_{i=1}^n (\ (2)(y_i - Ax_i)(x_i) \ ) ##

= ## \sum_{i=1}^n ( 2 x_i y_i - 2A {x_i}^2 ) = 2 \sum_{i=1}^n x_i y_i - 2A \sum_{i=1}^n {x_i}^2 ##

so ## A = \frac{\sum_{i=1}^n x_i y_i}{ \sum_{i=1}^n {x_i}^2} ## is an extremum of ## f ##.

It depends on the assumptions, but if you want to assume particular coefficients take particular constant values (such as zero) you don't compute any partial derivatives with respect to those coefficients since they aren't variables.

Thank you for this derivation. This is the exact result I got, but I didn't have much confidence in it and assumed there was a general formula.

I was also able to stumble across this on the wikipedia page earlier this morning - https://en.wikipedia.org/wiki/Simple_linear_regression#Linear_regression_without_the_intercept_term

This outlines the exact problem and comes to an identical conclusion to our answer.

Solving polynomial coefficients to minimize square error

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Similar threads

Undergrad Why ##a^0=1##?

Undergrad Finding the minimum distance between two curves

High School Straightforward integration…

High School Arc Length for Hyperbolic Sin

Undergrad Why is ##x=e^y## the inverse of ##y=\int_1^x \frac{1}{t} dt##?

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect