How do I properly set up the matrix for polynomial regression?

In summary: If you had more data points, you may run into trouble numerically solving for the coefficients. However, using a degree fit (ie, more than three data points) may help.
  • #1
jjj888
23
0
I'm trying to understand the derivation of polynomial regression.

Given data points: [(-1,-1),(2,-1),(6,-2)]. So a 2nd degree curve will be a concave downward parabola. My calculator produces the equation: -0.0357x2+0.0357x-0.9285. Which fits the data good. But if I try to do it manually in matrix form I run into an error. My summed up matrix looks like this:

| 3 7 39| |a| = 0.3436
| 7 39 233| |b| = 0.1237
|39 233 1311| |c| = -0.1512

Which leads me to think the equation should look like: 0.3436x2+0.1237x-0.1512. Of course this is wrong. Obviously I must have set the matrix up wrong. Can anyone offer any clarification?

Thanks
 
Physics news on Phys.org
  • #2
I wrote the matrix down wrong. Here is what I had.

| 3 7 39| |a| = -4
| 7 39 233| |b| = -28
|39 233 1311| |c| = -156

a = 0.3436
b = 0.1237
c = -0.1512
 
  • #3
I think you may want to calculate those 'normal-equation' elements again. Can you show more detail?
 
  • #4
| n Ʃx Ʃx2| |a| |y|
| Ʃx Ʃx2 Ʃx3| |b| |Ʃxy|
| Ʃx2 Ʃx3 Ʃx4| |c| |x2y|

This was the base matrix I used. From the partial derivs of the sqrared errors and such.
 
  • #5
Your structure is correct, but I don't get the matrix elements you have for the given data points. I do agree with your calculator results.

Data points:
(-1,-1)
(2,-1)
(6,-2)

[tex]n = 3 \quad \Sigma \,x = 7 \quad \Sigma \,x^2 = 41[/tex]
[tex]\quad \Sigma \, x^3 = 223 \quad \Sigma \, x^4 = 1313[/tex]

etc ... Recompute your elements (both left and right hand side) and solve system again.
 
  • #6
Thanks.

I did discover my x sums were wrong. Although I noticed the the matrix values and the calculator values were slightly different going down the decimal line. Am I to assume the matrix would be more accurate because the calculator uses some other algo? Or did your values match exactly?
 
  • #7
Well, I did my calculation via Excel to get a quick check on your results - that is, I actually formed the normal equations (A'A c = A'y) and solved by computing the inverse (yech!) of A'A.

I would not usually do this as the normal equations can tend to be ill-conditioned, but this is a small enough system such that the results shouldn't be impacted too much. Certainly, however, a Cholesky Decomposition would typically be preferred on these types of problems instead of computing/using Inverse. For larger problems, more stable methods are usually employed.

The values I get for the coefficients are as follows:
c0 = -0.928571429
c1 = 0.035714286
c2 = -0.035714286

These results also agree (to decimals places listed) with Excell's 'Regression' solver (from their 'Data Analysis' package).

I'm not sure what method your calculator is using. Older calculators would typically use a very similar technique to 'computing the inverse'. Newer, more advanced calculators may perform Cholesky. Also, there may be a significant difference in the computational arithmetic precision between calculator and, say, a computer (eg, 10 decimal digits in calculator vs 16 digits in computer). It all depends on calculator and method.

Try displaying additional decimal places on calculator.

Lastly, note that you are fitting three points to a quadratic, which requires three coefficients. Hence, you're really computing an interpolation in the end - and there are other more direct methods to come upe with these coefficients in that case.
 
  • #8
I know this is going to get a little beyond my grasp, but what causes the matrix to be ill-conditioned? I understand that in the end you are still trying to fit something to something else and there is no exact definition.

What are these more direct methods for finding coefficients for a quadratic? You're saying that with a small number of points a different method might be easier?
 
  • #9
A system can become ill-conditioned if the columns start to become linerarly dependent. That is, the solution can't determine, within any numerical precision, the difference between two columns. As a consequence, a small change in the coefficients can yield very large changes in the solution. See for example http://engrwww.usask.ca/classes/EE/840/notes/ILL_Conditioned%20Systems.pdf

In the case with least squares solutions, using the monomials [itex]x, x^2, x^3, x^4 , \dots[/itex] as a basis for the regression can lead to ill-condition systems that are hard to solve with any meaningful (accurate) results. One way to visualize this trouble is to consider the case for [itex]x \in (0,1)[/itex] and how much the monomials look like each other as [itex]x\rightarrow1[/itex].

Your particular case is small enough to resonably avoid these problems. However, if you had additional data and should you try for degree fits, say on the order of 6 or more, you may find trouble numerically solving via the normal equations.

I said more direct methods are available for your particular problem with three points since you have three coefficients to solve for and three data points. For example, just write out the quadratic expression for each of the data points in terms of the unknown coefficients. This will yield three linear equations in three unknowns (coefficients). Solve the system for the coefficients. You should yield the following:
c0 = -13/14
c1 = 1/28
c2 = -c1 = -1/28

Search the topic "interpolation" for more info/methods.
 
  • #10
What if the I have a large number of data points but they are all in a rough shape of a parabola, will I have the same problem with contitioning / dependency?
 
  • #11
Not necessarily.

Normal equations can blow up, but sometimes they can be made to work, e.g., by scaling the data.

If you don't want to take the chance of having problems with normal equations, polynomial regression can be carried out by using the original data combined with the desired polynomial model, and then calculating the regression coefficients using the QR algorithm on the resulting rectangular matrices of polynomial coefficients.
 

1. What is poly regression matrix?

Poly regression matrix is a mathematical technique used for fitting a polynomial curve to a set of data points. It involves finding the coefficients of a polynomial function that best fits the data, by minimizing the sum of squared errors between the actual data points and the predicted values from the polynomial curve.

2. How is poly regression matrix different from linear regression?

Poly regression matrix differs from linear regression in that it allows for a more flexible curve to be fitted to the data, as opposed to a straight line. This can be helpful in cases where the relationship between the variables is not linear, but instead follows a polynomial trend.

3. What is the degree of a polynomial in poly regression matrix?

The degree of a polynomial in poly regression matrix refers to the highest power of the independent variable in the polynomial function. For example, a polynomial with a degree of 3 would have the following form: ax^3 + bx^2 + cx + d.

4. How do you determine the degree of the polynomial in poly regression matrix?

The degree of the polynomial in poly regression matrix is determined by analyzing the relationship between the variables in the data set. This can be done by visual inspection of a scatter plot, or by using statistical tests to determine the best fitting degree for the polynomial curve.

5. What are some applications of poly regression matrix?

Poly regression matrix has various applications in fields such as economics, finance, and engineering. It can be used to analyze and predict trends in data, as well as to model complex relationships between variables. It is also commonly used in machine learning and data analysis.

Similar threads

  • Linear and Abstract Algebra
Replies
6
Views
1K
  • Precalculus Mathematics Homework Help
Replies
11
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
1K
  • Engineering and Comp Sci Homework Help
Replies
7
Views
886
Replies
4
Views
1K
  • Calculus and Beyond Homework Help
Replies
5
Views
3K
Replies
1
Views
2K
  • Introductory Physics Homework Help
Replies
1
Views
1K
  • Engineering and Comp Sci Homework Help
Replies
1
Views
2K
Back
Top