# Solving polynomial coefficients to minimize square error

• A
Hi there,

I'm working on a problem right now that relates to least squares error estimate for polynomial fitting.

I'm aware of techniques (iterative formulas) for finding the coefficients of a polynomial that minimizes the square error from a data set. So for example, for a data set that I believe is a first order polynomial, y(x) = a0 + a1*x, I have a method for finding the coefficients of a0 and a1.

However, what if I were to make an assumption about one of the coefficients, specifically, let's says I want a0 = 0. The question is then, let's find y(x) = a1*x, such that a1 minimizes the square error.

How would I go about solving this? The only method I know attempts to solve for a0 and a1, yet I'm making an assumption about a0 = 0, and am attempting to find a1.

Does anyone know how to solve this problem? Or if there is a general solution to solve this? I believe the problem is really a calculus of variation problems, and I simply do not know the math.

TLDR - What is the method for solving for polynomial coefficients (in a least squares sense), while being able to make assumptions about certain parameters.

Mark44
Mentor
Hi there,

I'm working on a problem right now that relates to least squares error estimate for polynomial fitting.

I'm aware of techniques (iterative formulas) for finding the coefficients of a polynomial that minimizes the square error from a data set. So for example, for a data set that I believe is a first order polynomial, y(x) = a0 + a1*x, I have a method for finding the coefficients of a0 and a1.

However, what if I were to make an assumption about one of the coefficients, specifically, let's says I want a0 = 0. The question is then, let's find y(x) = a1*x, such that a1 minimizes the square error.

How would I go about solving this? The only method I know attempts to solve for a0 and a1, yet I'm making an assumption about a0 = 0, and am attempting to find a1.
But is this a reasonable assumption (i.e., that a0 = 0)? If you had a set of data for which you could find the line of best fit y(x) = A + Bx, then it seems to me that the line y(x) = kx would not be a good fit. Or are you trying to find the line whose fit is "least bad"?
tomizzo said:
Does anyone know how to solve this problem? Or if there is a general solution to solve this? I believe the problem is really a calculus of variation problems, and I simply do not know the math.

TLDR - What is the method for solving for polynomial coefficients (in a least squares sense), while being able to make assumptions about certain parameters.

blue_leaf77
Homework Helper
The it becomes a one dimensional problem ##y-a_0 = a_1 x##, but this is not a general form of a line.

But is this a reasonable assumption (i.e., that a0 = 0)? If you had a set of data for which you could find the line of best fit y(x) = A + Bx, then it seems to me that the line y(x) = kx would not be a good fit. Or are you trying to find the line whose fit is "least bad"?
Hi Mark,

Thanks for the response. So I'm definitely not an expert in this area, but here's what I know (or that I believe I know).

You are correct in the sense that I could try to fit y(x) = A + Bx and solve for A and B. The resulting function I calculate would indeed better fit the collected data, however, it would not better fit the underlying true signal if I believe that signal to take on the form y(x) = Bx. Theoretically, I could try to fit an infinitely degree polynomial and would indeed reduce the square error, but it wouldn't fit the actual underly signal any better.

So in my case, I believe that the true underlying signal is of the form y(x) = Bx. Thus I should constraint the function as much as I can... But yeah, you could also say I'm trying to find a line that is 'least bad'.

So this question originally stemmed a from a specific instance, however, I'm now curious about the general case in which you can add constraints. I'm also curious if this a problem where Lagrange multipliers may be of use...

Mark44
Mentor
Hi Mark,

Thanks for the response. So I'm definitely not an expert in this area, but here's what I know (or that I believe I know).

You are correct in the sense that I could try to fit y(x) = A + Bx and solve for A and B. The resulting function I calculate would indeed better fit the collected data, however, it would not better fit the underlying true signal if I believe that signal to take on the form y(x) = Bx. Theoretically, I could try to fit an infinitely degree polynomial and would indeed reduce the square error, but it wouldn't fit the actual underly signal any better.

So in my case, I believe that the true underlying signal is of the form y(x) = Bx. Thus I should constraint the function as much as I can... But yeah, you could also say I'm trying to find a line that is 'least bad'.
If, as you said above, y = A + Bx "would indeed better fit the collected data" then why not use this line? If the data actually does lie close to the line y = Bx, then the regression line calculation should show that A is zero or close to it. I don't see the need for forcing A to be zero a priori.
tomizzo said:
So this question originally stemmed a from a specific instance, however, I'm now curious about the general case in which you can add constraints. I'm also curious if this a problem where Lagrange multipliers may be of use...
I don't know. It's been too many years since I've done anything with these, so I can't say. I'm not even sure they are applicable here.

If, as you said above, y = A + Bx "would indeed better fit the collected data" then why not use this line? If the data actually does lie close to the line y = Bx, then the regression line calculation should show that A is zero or close to it. I don't see the need for forcing A to be zero a priori.
I don't know. It's been too many years since I've done anything with these, so I can't say. I'm not even sure they are applicable here.

Yes it would better fit the collected data. However, it would not be a better approximation of the true underlying signal if I know that the signal has the form y = Bx. By trying to use a model of y = A + Bx, my coefficient approximations would be more influenced by noise in the data set.

Stephen Tashi
Does anyone know how to solve this problem? Or if there is a general solution to solve this? I believe the problem is really a calculus of variation problems, and I simply do not know the math.

Least squares fitting of a polynomial to data is the type of problem where you have a function f(A,B,C...) of several variables and you find the extrema of such a function by creating a system of linear equations. Each equation comes from setting the partial derivative of f with respect to one of its arguments equal to zero.

In the case of data ## (x_i, y_i)## and ## f(A) = \sum_{i=1}^n (y_i - Ax_i)^2 ## we have the single equation:

## 0 = \frac{\partial f}{\partial A} = \sum_{i=1}^n (\ (2)(y_i - Ax_i)(x_i) \ ) ##

= ## \sum_{i=1}^n ( 2 x_i y_i - 2A {x_i}^2 ) = 2 \sum_{i=1}^n x_i y_i - 2A \sum_{i=1}^n {x_i}^2 ##

so ## A = \frac{\sum_{i=1}^n x_i y_i}{ \sum_{i=1}^n {x_i}^2} ## is an extremum of ## f ##.

TLDR - What is the method for solving for polynomial coefficients (in a least squares sense), while being able to make assumptions about certain parameters.

It depends on the assumptions, but if you want to assume particular coefficients take particular constant values (such as zero) you don't compute any partial derivatives with respect to those coefficients since they aren't variables.

Least squares fitting of a polynomial to data is the type of problem where you have a function f(A,B,C...) of several variables and you find the extrema of such a function by creating a system of linear equations. Each equation comes from setting the partial derivative of f with respect to one of its arguments equal to zero.

In the case of data ## (x_i, y_i)## and ## f(A) = \sum_{i=1}^n (y_i - Ax_i)^2 ## we have the single equation:

## 0 = \frac{\partial f}{\partial A} = \sum_{i=1}^n (\ (2)(y_i - Ax_i)(x_i) \ ) ##

= ## \sum_{i=1}^n ( 2 x_i y_i - 2A {x_i}^2 ) = 2 \sum_{i=1}^n x_i y_i - 2A \sum_{i=1}^n {x_i}^2 ##

so ## A = \frac{\sum_{i=1}^n x_i y_i}{ \sum_{i=1}^n {x_i}^2} ## is an extremum of ## f ##.

It depends on the assumptions, but if you want to assume particular coefficients take particular constant values (such as zero) you don't compute any partial derivatives with respect to those coefficients since they aren't variables.

Thank you for this derivation. This is the exact result I got, but I didn't have much confidence in it and assumed there was a general formula.

I was also able to stumble across this on the wikipedia page earlier this morning - https://en.wikipedia.org/wiki/Simple_linear_regression#Linear_regression_without_the_intercept_term

This outlines the exact problem and comes to an identical conclusion to our answer.