Least-squares estimation of linear regression coefficients

Click For Summary

Discussion Overview

The discussion revolves around the least-squares estimation of coefficients in a linear regression model, specifically focusing on the function y = a sin(x) + b cos(x). Participants explore various approaches to derive the coefficients a and b, including the formulation of normal equations and alternative methods for optimization.

Discussion Character

  • Technical explanation
  • Mathematical reasoning
  • Debate/contested

Main Points Raised

  • One participant presents a method for deriving the least-squares coefficients using partial derivatives of a loss function, but receives feedback that the variable notation may be incorrect.
  • Another participant identifies the formulation of normal equations and suggests that computational methods like QR or SVD may be preferable to directly using the normal equations.
  • Some participants propose rewriting the regression equation in terms of a single variable c and a phase shift d, suggesting an iterative approach to minimize the residual sum of squares (RSS).
  • A later reply questions how to vary the parameter d effectively to ensure convergence towards a minimum RSS, expressing uncertainty about the choice of increment.
  • Concerns are raised regarding the independence of the columns in the matrix used for regression, with one participant asserting that linear independence is necessary for a unique solution.
  • Another participant challenges the assertion of independence when sample values may be identical, leading to a discussion about the implications of such cases on the solution's validity.

Areas of Agreement / Disagreement

Participants express differing views on the formulation of the regression model and the methods for estimating coefficients. There is no consensus on the best approach, and several competing ideas are presented regarding the treatment of the variables and the optimization process.

Contextual Notes

Participants note potential issues with the independence of the matrix columns and the implications for the uniqueness of the least-squares solution. The discussion includes various assumptions about the behavior of the residuals and the choice of parameters in the optimization process.

DMTN
Messages
1
Reaction score
0
AFAIK, there are two basic type of linear regression:
y=ax+b and y=a2 + bx + c
But I have to do the same with the function y = asin(x)+bcos(x).
Here is what I have done:

We have:
[tex] \begin{array}{l}<br /> \frac{{\partial L}}{{\partial a}} = 0 <br /> <br /> \frac{{\partial L}}{{\partial b}} = 0[/tex]Continue:
[tex] \begin{array}{l}<br /> \frac{{\partial L}}{{\partial a}} = \sum\limits_{i = 1}^n {2\left[ {fi - \left( {a\sin (\frac{{\pi x}}{2}) + b\cos (\frac{{\pi x}}{2})} \right)} \right]\left( { - \sin (\frac{{\pi x}}{2})} \right)}<br /> \frac{{\partial L}}{{\partial b}} = \sum\limits_{i = 1}^n {2\left[ {fi - \left( {a\sin (\frac{{\pi x}}{2}) + b\cos (\frac{{\pi x}}{2})} \right)} \right]\left( {\cos (\frac{{\pi x}}{2})} \right)}<br /> \end{array}[/tex]

At last, I have:

[tex] \left( {\begin{array}{*{20}c}<br /> {\sin ^2 \left( {\frac{{\pi x}}{2}} \right)} & {\sin \left( {\frac{{\pi x}}{2}} \right)\cos \left( {\frac{{\pi x}}{2}} \right)} \\<br /> {\sin \left( {\frac{{\pi x}}{2}} \right)\cos \left( {\frac{{\pi x}}{2}} \right)} & {\cos ^2 \left( {\frac{{\pi x}}{2}} \right)} \\<br /> \end{array}} \right)\left( \begin{array}{l}<br /> a \\ <br /> b \\ <br /> \end{array} \right) = \left( \begin{array}{l}<br /> fi\sin \left( {\frac{{\pi x}}{2}} \right) \\ <br /> fi\cos \left( {\frac{{\pi x}}{2}} \right) \\ <br /> \end{array} \right)[/tex]

What I have to do now? Please suggest me with this situation.
 
Physics news on Phys.org
It doesn't look right at all. For starters, you should have xi as the argument for each i, not x. Then the known quantities in the matrix and the r.h.s. vector will all have summations over i.
 
Looks like you are trying to develop what are called the 'normal equations':

[tex]A^TAc = A^Ty[/tex]

Check out the 1st attachment in the following thread:

https://www.physicsforums.com/showthread.php?t=97391

The normal equations are fine from a mathematical standpoint, but in computational practice it is usually not a good idea to use them. It's better to factor A using QR or SVD. Example using QR:

[tex]Rc = Q^Ty[/tex]

http://www.alkires.com/teaching/ee103/Rec8_LLSAndQRFactorization.htm
 
Last edited by a moderator:
hotvette said:
Looks like you are trying to develop what are called the 'normal equations':

[tex]A^TAc = A^Ty[/tex]

Check out the 1st attachment in the following thread:

https://www.physicsforums.com/showthread.php?t=97391

The normal equations are fine from a mathematical standpoint, but in computational practice it is usually not a good idea to use them. It's better to factor A using QR or SVD. Example using QR:

[tex]Rc = Q^Ty[/tex]

http://www.alkires.com/teaching/ee103/Rec8_LLSAndQRFactorization.htm



great tutorials on Least square method.
to OP:

it is very simple that you can write the equation like below
[tex]\left[\begin{array}{cc}<br /> sin(x) & cos(x)\end{array}\right]\left[\begin{array}{c}<br /> a\\<br /> b\end{array}\right]=y[/tex]

and for each Xi and Yi, you get the quation
[tex]\left[\begin{array}{cc}<br /> sin(x_{i}) & cos(x_{i})\end{array}\right]\left[\begin{array}{c}<br /> a\\<br /> b\end{array}\right]=y_{i}[/tex]

then sum the equation together,you get[tex]Ac=Y[/tex]
where [tex]C=\left[\begin{array}{c}<br /> a\\<br /> b\end{array}\right][/tex]

then, simply you can solve it to get the best coefficients a and b.
 
Last edited by a moderator:
zyh said:
great tutorials on Least square method.
to OP:

it is very simple that you can write the equation like below
[tex]\left[\begin{array}{cc}<br /> sin(x) & cos(x)\end{array}\right]\left[\begin{array}{c}<br /> a\\<br /> b\end{array}\right]=y[/tex]

and for each Xi and Yi, you get the quation
[tex]\left[\begin{array}{cc}<br /> sin(x_{i}) & cos(x_{i})\end{array}\right]\left[\begin{array}{c}<br /> a\\<br /> b\end{array}\right]=y_{i}[/tex]

then sum the equation together,you get[tex]Ac=Y[/tex]
where [tex]C=\left[\begin{array}{c}<br /> a\\<br /> b\end{array}\right][/tex]

then, simply you can solve it to get the best coefficients a and b.

There is a problem in your approach. Sin(x) and Cos(x) are not un-correlated. The matrix (A'A) may be a singular one depending on the sample values.
 
The problem may be tackled in the following way:
Write aSin(x)+bCos(x)= c.Sin(d+x), c=sqrt(A^2+b^2) and Sin(d)=b/c.
Start with any arbitrary value of d. Regress to find c in the usual way. Find the residual sum of squares(RSS). Now vary d. Repeat previous procedure. Again find residual sum of squares. Compare this value of RSS with the previous one and check how the RSS decreases with variation of d. Go on repeating the procedure till the RSS value does not decrease further (or you are satisfied with a very small value of the RSS). Choose this pair of c and d. Solve to find a and b.
 
Great,ssd,This is a wonderful algorithm.
Let me explain it more detailly.
we can rewrite the equation below:
[tex]y=asin(x)+bcos(x)=\sqrt{a^{2}+b^{2}}\left(\frac{a}{\sqrt{a^{2}+b^{2}}}sin(x)+\frac{b}{\sqrt{a^{2}+b^{2}}}cos(x)\right)[/tex]
so, I can simply define a variable c and d.which like below:
define variable c: [tex]c=\sqrt{a^{2}+b^{2}}[/tex]

define variable d: [tex]sin(d)=\frac{b}{c},cos(d)=\frac{a}{c}[/tex]

so, we can get [tex]y=csin(d+x)[/tex]

As you said "start with any arbitrary value of d" it is very simple to find C in the usual way because there is only "one unknow variable c " in
[tex]y=csin(d+x)[/tex]
Also, it's easy to get the c ,furthermore the RSS.

But my question is how does the "d" vary? Which I mean I should get another d value which is bigger than the previous? or smaller? Are there a convergence way to let the RSS smaller..?

Thank you!
 
zyh said:
But my question is how does the "d" vary? Which I mean I should get another d value which is bigger than the previous? or smaller? Are there a convergence way to let the RSS smaller..?

Thank you!
Thanks for your comments.

We started with any arbitrary value of d. Next change d to d+10, say. We can change d to d-10 also or by any arbitrary amount. Now we have to check whether RSS increases or decreases. If increases then we have to change d in the other direction. In brief, we vary d in a way that at the termination point of the algorithm, RSS shall increase if d is changed (in whichever way). That is, choose d in a way that RSS has at least a local minimum at that value of d.
 
If increases then we have to change d in the other direction. In brief, we vary d in a way that at the termination point of the algorithm, RSS shall increase if d is changed (in whichever way).
Hi, I'm grad to discuss with this topic with you, but I think it's still numberically difficulty to give the algorithm like this. Because I don't know whether d = d + 10? or d = d + 100? or other value. I't seems too arbitrary:frown:.
Let me take sometime to analysis this idea.
 
  • #10
zyh said:
Hi, I'm grad to discuss with this topic with you, but I think it's still numberically difficulty to give the algorithm like this. Because I don't know whether d = d + 10? or d = d + 100? or other value. I't seems too arbitrary:frown:.
Let me take sometime to analysis this idea.

Basically the first change in d has to be arbitrary unless we have other information. Generally it does not make big difference if the increment is by 10 or 100 since we started with arbitrary d. Which matters is to detect the direction further changes. In reality, a suitable computer program detects the minimum of RSS almost instantly through this method.

Looking forward for thoughts from you.
 
  • #11
hi, ssd, I think you'd made a mistake of linear square mathod.
look at here:
http://en.wikipedia.org/wiki/Linear_least_squares#The_general_problem
The linear least squares problem has a unique solution, provided that the n columns of the matrix X are linearly independent. The solution is obtained by solving the normal equations
so, for the equation AC = Y
which [tex]A=\left[\begin{array}{cc}<br /> sin(x_{1}) & cos(x_{1})\\<br /> sin(x_{2}) & cos(x_{2})\\<br /> sin(x_{3}) & cos(x_{3})\\<br /> \cdots & \cdots\end{array}\right][/tex]

even x1=x2, we can still get A is linearly independent in columns
so, I think the regular algorithm still applies.
 
  • #12
I don't understand what you mean by 'even x1=x2'? Do you mean two columns of A are identical? Then of course al the columns of A are not independent. If you talk of the first two rows being identical then its not really relevant here.
Now look at my first post. I said A'A may be singular depending on the sample values. That is, we cannot eleminate the chance of singularity (I stated this generally for correlated columns and in our particular problem the columns are correlated).
In my approach you get the same answer if there is no singularity, and if it is there then also right answer is obtained.
Further more, if one has equations of the form y= a+ b.Sin(x) + c.d^x, then the method I stated still remains a handy approach.
 
Last edited:
  • #13
Hi, let me clearify my thoughts.

I mean that A'A is sigular such as x1==x2 (x1, x2, x3 ... are all sampled value of x).
The first two rows of A are identical, but they don't effect the dependence of the columns. Because normally there are so many numbers of xi.

You said that " A'A" may be singular. I do agree! Not only in the problem "y=asinx+bcos", But this "singular condision" may exists in every LSM problem.

consider:
[tex]AC=Y[/tex].
(http://en.wikipedia.org/wiki/Linear_least_squares#The_general_problem)

If rank(A)<rank([A,Y]), which means these equations have no exact solutions.
so, the LSM can be applied.

Let's consider the augmented equation :
[tex]A^{T}AC=A^{T}Y[/tex]
because [tex]rank(A^{T}A)=rank(\left[A^{T}A,A^{T}Y\right][/tex] can always be obtained, the argumented equations do have solutions.
This can divide to two conditions.
  1. singular:If A'A is singular is singular, then we have infinite numbers of solutions.
  2. non-singular: we have only "ONE solution".

So, if we check that rank(A) = dimensions of C, we can always get the "ONE solution". Otherwise, I don't think there is a fixed handy approach.:rolleyes:

Thanks for reading.
 

Similar threads

  • · Replies 18 ·
Replies
18
Views
4K
  • · Replies 25 ·
Replies
25
Views
3K
  • · Replies 9 ·
Replies
9
Views
2K
  • · Replies 8 ·
Replies
8
Views
3K
  • · Replies 6 ·
Replies
6
Views
2K
  • · Replies 1 ·
Replies
1
Views
3K
  • · Replies 1 ·
Replies
1
Views
2K
Replies
3
Views
3K
  • · Replies 23 ·
Replies
23
Views
4K
  • · Replies 9 ·
Replies
9
Views
3K