Least-squares estimation of linear regression coefficients

DMTN · Jul 14, 2008

AFAIK, there are two basic type of linear regression:
y=ax+b and y=a² + bx + c
But I have to do the same with the function y = asin(x)+bcos(x).
Here is what I have done:

We have:
[tex] \begin{array}{l} \frac{{\partial L}}{{\partial a}} = 0 \frac{{\partial L}}{{\partial b}} = 0[/tex]Continue:
[tex] \begin{array}{l} \frac{{\partial L}}{{\partial a}} = \sum\limits_{i = 1}^n {2\left[ {fi - \left( {a\sin (\frac{{\pi x}}{2}) + b\cos (\frac{{\pi x}}{2})} \right)} \right]\left( { - \sin (\frac{{\pi x}}{2})} \right)} \frac{{\partial L}}{{\partial b}} = \sum\limits_{i = 1}^n {2\left[ {fi - \left( {a\sin (\frac{{\pi x}}{2}) + b\cos (\frac{{\pi x}}{2})} \right)} \right]\left( {\cos (\frac{{\pi x}}{2})} \right)} \end{array}[/tex]

At last, I have:

[tex] \left( {\begin{array}{*{20}c} {\sin ^2 \left( {\frac{{\pi x}}{2}} \right)} & {\sin \left( {\frac{{\pi x}}{2}} \right)\cos \left( {\frac{{\pi x}}{2}} \right)} \\ {\sin \left( {\frac{{\pi x}}{2}} \right)\cos \left( {\frac{{\pi x}}{2}} \right)} & {\cos ^2 \left( {\frac{{\pi x}}{2}} \right)} \\ \end{array}} \right)\left( \begin{array}{l} a \\ b \\ \end{array} \right) = \left( \begin{array}{l} fi\sin \left( {\frac{{\pi x}}{2}} \right) \\ fi\cos \left( {\frac{{\pi x}}{2}} \right) \\ \end{array} \right)[/tex]

What I have to do now? Please suggest me with this situation.

mathman · Jul 14, 2008

It doesn't look right at all. For starters, you should have x_i as the argument for each i, not x. Then the known quantities in the matrix and the r.h.s. vector will all have summations over i.

hotvette · Jul 15, 2008

Looks like you are trying to develop what are called the 'normal equations':

[tex]A^TAc = A^Ty[/tex]

Check out the 1st attachment in the following thread:

https://www.physicsforums.com/showthread.php?t=97391

The normal equations are fine from a mathematical standpoint, but in computational practice it is usually not a good idea to use them. It's better to factor A using QR or SVD. Example using QR:

[tex]Rc = Q^Ty[/tex]

http://www.alkires.com/teaching/ee103/Rec8_LLSAndQRFactorization.htm

zyh · Jul 17, 2008

hotvette said:

Looks like you are trying to develop what are called the 'normal equations':

[tex]A^TAc = A^Ty[/tex]

Check out the 1st attachment in the following thread:

https://www.physicsforums.com/showthread.php?t=97391

The normal equations are fine from a mathematical standpoint, but in computational practice it is usually not a good idea to use them. It's better to factor A using QR or SVD. Example using QR:

[tex]Rc = Q^Ty[/tex]

http://www.alkires.com/teaching/ee103/Rec8_LLSAndQRFactorization.htm

great tutorials on Least square method.
to OP:

it is very simple that you can write the equation like below
[tex]\left[\begin{array}{cc} sin(x) & cos(x)\end{array}\right]\left[\begin{array}{c} a\\ b\end{array}\right]=y[/tex]

and for each Xi and Yi, you get the quation
[tex]\left[\begin{array}{cc} sin(x_{i}) & cos(x_{i})\end{array}\right]\left[\begin{array}{c} a\\ b\end{array}\right]=y_{i}[/tex]

then sum the equation together,you get[tex]Ac=Y[/tex]
where [tex]C=\left[\begin{array}{c} a\\ b\end{array}\right][/tex]

then, simply you can solve it to get the best coefficients a and b.

ssd · Jul 25, 2008

zyh said:

great tutorials on Least square method.
to OP:

it is very simple that you can write the equation like below
[tex]\left[\begin{array}{cc} sin(x) & cos(x)\end{array}\right]\left[\begin{array}{c} a\\ b\end{array}\right]=y[/tex]

and for each Xi and Yi, you get the quation
[tex]\left[\begin{array}{cc} sin(x_{i}) & cos(x_{i})\end{array}\right]\left[\begin{array}{c} a\\ b\end{array}\right]=y_{i}[/tex]

then sum the equation together,you get[tex]Ac=Y[/tex]
where [tex]C=\left[\begin{array}{c} a\\ b\end{array}\right][/tex]

then, simply you can solve it to get the best coefficients a and b.

There is a problem in your approach. Sin(x) and Cos(x) are not un-correlated. The matrix (A'A) may be a singular one depending on the sample values.

ssd · Jul 25, 2008

The problem may be tackled in the following way:
Write aSin(x)+bCos(x)= c.Sin(d+x), c=sqrt(A^2+b^2) and Sin(d)=b/c.
Start with any arbitrary value of d. Regress to find c in the usual way. Find the residual sum of squares(RSS). Now vary d. Repeat previous procedure. Again find residual sum of squares. Compare this value of RSS with the previous one and check how the RSS decreases with variation of d. Go on repeating the procedure till the RSS value does not decrease further (or you are satisfied with a very small value of the RSS). Choose this pair of c and d. Solve to find a and b.

zyh · Jul 25, 2008

Great,ssd,This is a wonderful algorithm.
Let me explain it more detailly.
we can rewrite the equation below:
[tex]y=asin(x)+bcos(x)=\sqrt{a^{2}+b^{2}}\left(\frac{a}{\sqrt{a^{2}+b^{2}}}sin(x)+\frac{b}{\sqrt{a^{2}+b^{2}}}cos(x)\right)[/tex]
so, I can simply define a variable c and d.which like below:
define variable c: [tex]c=\sqrt{a^{2}+b^{2}}[/tex]

define variable d: [tex]sin(d)=\frac{b}{c},cos(d)=\frac{a}{c}[/tex]

so, we can get [tex]y=csin(d+x)[/tex]

As you said "start with any arbitrary value of d" it is very simple to find C in the usual way because there is only "one unknow variable c " in
[tex]y=csin(d+x)[/tex]
Also, it's easy to get the c ,furthermore the RSS.

But my question is how does the "d" vary? Which I mean I should get another d value which is bigger than the previous? or smaller? Are there a convergence way to let the RSS smaller..?

Thank you!

ssd · Jul 26, 2008

zyh said:

But my question is how does the "d" vary? Which I mean I should get another d value which is bigger than the previous? or smaller? Are there a convergence way to let the RSS smaller..?

Thank you!

Thanks for your comments.

We started with any arbitrary value of d. Next change d to d+10, say. We can change d to d-10 also or by any arbitrary amount. Now we have to check whether RSS increases or decreases. If increases then we have to change d in the other direction. In brief, we vary d in a way that at the termination point of the algorithm, RSS shall increase if d is changed (in whichever way). That is, choose d in a way that RSS has at least a local minimum at that value of d.

zyh · Jul 26, 2008

If increases then we have to change d in the other direction. In brief, we vary d in a way that at the termination point of the algorithm, RSS shall increase if d is changed (in whichever way).

Hi, I'm grad to discuss with this topic with you, but I think it's still numberically difficulty to give the algorithm like this. Because I don't know whether d = d + 10? or d = d + 100? or other value. I't seems too arbitrary

.
Let me take sometime to analysis this idea.

ssd · Jul 26, 2008

zyh said:

Hi, I'm grad to discuss with this topic with you, but I think it's still numberically difficulty to give the algorithm like this. Because I don't know whether d = d + 10? or d = d + 100? or other value. I't seems too arbitrary.
Let me take sometime to analysis this idea.

Basically the first change in d has to be arbitrary unless we have other information. Generally it does not make big difference if the increment is by 10 or 100 since we started with arbitrary d. Which matters is to detect the direction further changes. In reality, a suitable computer program detects the minimum of RSS almost instantly through this method.

Looking forward for thoughts from you.

zyh · Jul 27, 2008

hi, ssd, I think you'd made a mistake of linear square mathod.
look at here:
http://en.wikipedia.org/wiki/Linear_least_squares#The_general_problem

The linear least squares problem has a unique solution, provided that the n columns of the matrix X are linearly independent. The solution is obtained by solving the normal equations

so, for the equation AC = Y
which [tex]A=\left[\begin{array}{cc} sin(x_{1}) & cos(x_{1})\\ sin(x_{2}) & cos(x_{2})\\ sin(x_{3}) & cos(x_{3})\\ \cdots & \cdots\end{array}\right][/tex]

even x1=x2, we can still get A is linearly independent in columns
so, I think the regular algorithm still applies.

ssd · Jul 27, 2008

I don't understand what you mean by 'even x1=x2'? Do you mean two columns of A are identical? Then of course al the columns of A are not independent. If you talk of the first two rows being identical then its not really relevant here.
Now look at my first post. I said A'A may be singular depending on the sample values. That is, we cannot eleminate the chance of singularity (I stated this generally for correlated columns and in our particular problem the columns are correlated).
In my approach you get the same answer if there is no singularity, and if it is there then also right answer is obtained.
Further more, if one has equations of the form y= a+ b.Sin(x) + c.d^x, then the method I stated still remains a handy approach.

zyh · Jul 27, 2008

Hi, let me clearify my thoughts.

I mean that A'A is sigular such as x1==x2 (x1, x2, x3 ... are all sampled value of x).
The first two rows of A are identical, but they don't effect the dependence of the columns. Because normally there are so many numbers of xi.

You said that " A'A" may be singular. I do agree! Not only in the problem "y=asinx+bcos", But this "singular condision" may exists in every LSM problem.

consider:
[tex]AC=Y[/tex].
(http://en.wikipedia.org/wiki/Linear_least_squares#The_general_problem)

If rank(A)<rank([A,Y]), which means these equations have no exact solutions.
so, the LSM can be applied.

Let's consider the augmented equation :
[tex]A^{T}AC=A^{T}Y[/tex]
because [tex]rank(A^{T}A)=rank(\left[A^{T}A,A^{T}Y\right][/tex] can always be obtained, the argumented equations do have solutions.
This can divide to two conditions.

singular:If A'A is singular is singular, then we have infinite numbers of solutions.
non-singular: we have only "ONE solution".

So, if we check that rank(A) = dimensions of C, we can always get the "ONE solution". Otherwise, I don't think there is a fixed handy approach.

Thanks for reading.

Least-squares estimation of linear regression coefficients

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Similar threads

Graduate Hypothesis testing: Defining H0, HA hypotheses so that ( H_A)_A' makes sense

Undergrad My basic understanding of set theory

Undergrad How do E[X] and E[|X|] relate?

Graduate Expected numbers of cards of a last color remaining

Undergrad How does axiom of foundation prevent infinite sequence of elements?

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight