Least-squares estimation of linear regression coefficients

In summary, there are several ways to perform linear regression, including the two basic types of y=ax+b and y=a2 + bx + c. However, there is also the option to use the function y = asin(x)+bcos(x), which requires a different approach. One method is to use the normal equations, which involves finding the matrices A^TAc = A^Ty and solving for the coefficients a and b. Another method is to use a variable substitution, defining c = sqrt(a^2 + b^2) and d = sin^-1(b/c) to simplify the equation to y = c sin(d + x). In this approach, d can be varied to find the best fit values for a and b that
  • #1
DMTN
1
0
AFAIK, there are two basic type of linear regression:
y=ax+b and y=a2 + bx + c
But I have to do the same with the function y = asin(x)+bcos(x).
Here is what I have done:

We have:
[tex]
\begin{array}{l}
\frac{{\partial L}}{{\partial a}} = 0

\frac{{\partial L}}{{\partial b}} = 0[/tex]Continue:
[tex]
\begin{array}{l}
\frac{{\partial L}}{{\partial a}} = \sum\limits_{i = 1}^n {2\left[ {fi - \left( {a\sin (\frac{{\pi x}}{2}) + b\cos (\frac{{\pi x}}{2})} \right)} \right]\left( { - \sin (\frac{{\pi x}}{2})} \right)}
\frac{{\partial L}}{{\partial b}} = \sum\limits_{i = 1}^n {2\left[ {fi - \left( {a\sin (\frac{{\pi x}}{2}) + b\cos (\frac{{\pi x}}{2})} \right)} \right]\left( {\cos (\frac{{\pi x}}{2})} \right)}
\end{array}[/tex]

At last, I have:

[tex]
\left( {\begin{array}{*{20}c}
{\sin ^2 \left( {\frac{{\pi x}}{2}} \right)} & {\sin \left( {\frac{{\pi x}}{2}} \right)\cos \left( {\frac{{\pi x}}{2}} \right)} \\
{\sin \left( {\frac{{\pi x}}{2}} \right)\cos \left( {\frac{{\pi x}}{2}} \right)} & {\cos ^2 \left( {\frac{{\pi x}}{2}} \right)} \\
\end{array}} \right)\left( \begin{array}{l}
a \\
b \\
\end{array} \right) = \left( \begin{array}{l}
fi\sin \left( {\frac{{\pi x}}{2}} \right) \\
fi\cos \left( {\frac{{\pi x}}{2}} \right) \\
\end{array} \right)
[/tex]

What I have to do now? Please suggest me with this situation.
 
Physics news on Phys.org
  • #2
It doesn't look right at all. For starters, you should have xi as the argument for each i, not x. Then the known quantities in the matrix and the r.h.s. vector will all have summations over i.
 
  • #3
Looks like you are trying to develop what are called the 'normal equations':

[tex]A^TAc = A^Ty[/tex]

Check out the 1st attachment in the following thread:

https://www.physicsforums.com/showthread.php?t=97391

The normal equations are fine from a mathematical standpoint, but in computational practice it is usually not a good idea to use them. It's better to factor A using QR or SVD. Example using QR:

[tex]Rc = Q^Ty[/tex]

http://www.alkires.com/teaching/ee103/Rec8_LLSAndQRFactorization.htm
 
Last edited by a moderator:
  • #4
hotvette said:
Looks like you are trying to develop what are called the 'normal equations':

[tex]A^TAc = A^Ty[/tex]

Check out the 1st attachment in the following thread:

https://www.physicsforums.com/showthread.php?t=97391

The normal equations are fine from a mathematical standpoint, but in computational practice it is usually not a good idea to use them. It's better to factor A using QR or SVD. Example using QR:

[tex]Rc = Q^Ty[/tex]

http://www.alkires.com/teaching/ee103/Rec8_LLSAndQRFactorization.htm



great tutorials on Least square method.
to OP:

it is very simple that you can write the equation like below
[tex]\left[\begin{array}{cc}
sin(x) & cos(x)\end{array}\right]\left[\begin{array}{c}
a\\
b\end{array}\right]=y[/tex]

and for each Xi and Yi, you get the quation
[tex]\left[\begin{array}{cc}
sin(x_{i}) & cos(x_{i})\end{array}\right]\left[\begin{array}{c}
a\\
b\end{array}\right]=y_{i}[/tex]

then sum the equation together,you get[tex]Ac=Y[/tex]
where [tex]C=\left[\begin{array}{c}
a\\
b\end{array}\right][/tex]

then, simply you can solve it to get the best coefficients a and b.
 
Last edited by a moderator:
  • #5
zyh said:
great tutorials on Least square method.
to OP:

it is very simple that you can write the equation like below
[tex]\left[\begin{array}{cc}
sin(x) & cos(x)\end{array}\right]\left[\begin{array}{c}
a\\
b\end{array}\right]=y[/tex]

and for each Xi and Yi, you get the quation
[tex]\left[\begin{array}{cc}
sin(x_{i}) & cos(x_{i})\end{array}\right]\left[\begin{array}{c}
a\\
b\end{array}\right]=y_{i}[/tex]

then sum the equation together,you get[tex]Ac=Y[/tex]
where [tex]C=\left[\begin{array}{c}
a\\
b\end{array}\right][/tex]

then, simply you can solve it to get the best coefficients a and b.

There is a problem in your approach. Sin(x) and Cos(x) are not un-correlated. The matrix (A'A) may be a singular one depending on the sample values.
 
  • #6
The problem may be tackled in the following way:
Write aSin(x)+bCos(x)= c.Sin(d+x), c=sqrt(A^2+b^2) and Sin(d)=b/c.
Start with any arbitrary value of d. Regress to find c in the usual way. Find the residual sum of squares(RSS). Now vary d. Repeat previous procedure. Again find residual sum of squares. Compare this value of RSS with the previous one and check how the RSS decreases with variation of d. Go on repeating the procedure till the RSS value does not decrease further (or you are satisfied with a very small value of the RSS). Choose this pair of c and d. Solve to find a and b.
 
  • #7
Great,ssd,This is a wonderful algorithm.
Let me explain it more detailly.
we can rewrite the equation below:
[tex]y=asin(x)+bcos(x)=\sqrt{a^{2}+b^{2}}\left(\frac{a}{\sqrt{a^{2}+b^{2}}}sin(x)+\frac{b}{\sqrt{a^{2}+b^{2}}}cos(x)\right)[/tex]
so, I can simply define a variable c and d.which like below:
define variable c: [tex]c=\sqrt{a^{2}+b^{2}}[/tex]

define variable d: [tex]sin(d)=\frac{b}{c},cos(d)=\frac{a}{c}[/tex]

so, we can get [tex]y=csin(d+x)[/tex]

As you said "start with any arbitrary value of d" it is very simple to find C in the usual way because there is only "one unknow variable c " in
[tex]y=csin(d+x)[/tex]
Also, it's easy to get the c ,furthermore the RSS.

But my question is how does the "d" vary? Which I mean I should get another d value which is bigger than the previous? or smaller? Are there a convergence way to let the RSS smaller..?

Thank you!
 
  • #8
zyh said:
But my question is how does the "d" vary? Which I mean I should get another d value which is bigger than the previous? or smaller? Are there a convergence way to let the RSS smaller..?

Thank you!
Thanks for your comments.

We started with any arbitrary value of d. Next change d to d+10, say. We can change d to d-10 also or by any arbitrary amount. Now we have to check whether RSS increases or decreases. If increases then we have to change d in the other direction. In brief, we vary d in a way that at the termination point of the algorithm, RSS shall increase if d is changed (in whichever way). That is, choose d in a way that RSS has at least a local minimum at that value of d.
 
  • #9
If increases then we have to change d in the other direction. In brief, we vary d in a way that at the termination point of the algorithm, RSS shall increase if d is changed (in whichever way).
Hi, I'm grad to discuss with this topic with you, but I think it's still numberically difficulty to give the algorithm like this. Because I don't know whether d = d + 10? or d = d + 100? or other value. I't seems too arbitrary:frown:.
Let me take sometime to analysis this idea.
 
  • #10
zyh said:
Hi, I'm grad to discuss with this topic with you, but I think it's still numberically difficulty to give the algorithm like this. Because I don't know whether d = d + 10? or d = d + 100? or other value. I't seems too arbitrary:frown:.
Let me take sometime to analysis this idea.

Basically the first change in d has to be arbitrary unless we have other information. Generally it does not make big difference if the increment is by 10 or 100 since we started with arbitrary d. Which matters is to detect the direction further changes. In reality, a suitable computer program detects the minimum of RSS almost instantly through this method.

Looking forward for thoughts from you.
 
  • #11
hi, ssd, I think you'd made a mistake of linear square mathod.
look at here:
http://en.wikipedia.org/wiki/Linear_least_squares#The_general_problem
The linear least squares problem has a unique solution, provided that the n columns of the matrix X are linearly independent. The solution is obtained by solving the normal equations
so, for the equation AC = Y
which [tex]A=\left[\begin{array}{cc}
sin(x_{1}) & cos(x_{1})\\
sin(x_{2}) & cos(x_{2})\\
sin(x_{3}) & cos(x_{3})\\
\cdots & \cdots\end{array}\right][/tex]

even x1=x2, we can still get A is linearly independent in columns
so, I think the regular algorithm still applies.
 
  • #12
I don't understand what you mean by 'even x1=x2'? Do you mean two columns of A are identical? Then of course al the columns of A are not independent. If you talk of the first two rows being identical then its not really relevant here.
Now look at my first post. I said A'A may be singular depending on the sample values. That is, we cannot eleminate the chance of singularity (I stated this generally for correlated columns and in our particular problem the columns are correlated).
In my approach you get the same answer if there is no singularity, and if it is there then also right answer is obtained.
Further more, if one has equations of the form y= a+ b.Sin(x) + c.d^x, then the method I stated still remains a handy approach.
 
Last edited:
  • #13
Hi, let me clearify my thoughts.

I mean that A'A is sigular such as x1==x2 (x1, x2, x3 ... are all sampled value of x).
The first two rows of A are identical, but they don't effect the dependence of the columns. Because normally there are so many numbers of xi.

You said that " A'A" may be singular. I do agree! Not only in the problem "y=asinx+bcos", But this "singular condision" may exists in every LSM problem.

consider:
[tex]AC=Y[/tex].
(http://en.wikipedia.org/wiki/Linear_least_squares#The_general_problem)

If rank(A)<rank([A,Y]), which means these equations have no exact solutions.
so, the LSM can be applied.

Let's consider the augmented equation :
[tex]A^{T}AC=A^{T}Y[/tex]
because [tex]rank(A^{T}A)=rank(\left[A^{T}A,A^{T}Y\right][/tex] can always be obtained, the argumented equations do have solutions.
This can divide to two conditions.
  1. singular:If A'A is singular is singular, then we have infinite numbers of solutions.
  2. non-singular: we have only "ONE solution".

So, if we check that rank(A) = dimensions of C, we can always get the "ONE solution". Otherwise, I don't think there is a fixed handy approach.:rolleyes:

Thanks for reading.
 

1. What is least-squares estimation of linear regression coefficients?

Least-squares estimation is a statistical method used to determine the best-fitting line or curve for a set of data points in a linear regression model. It involves finding the regression coefficients that minimize the sum of the squared differences between the observed data and the predicted values from the model.

2. How is least-squares estimation used in linear regression?

In linear regression, least-squares estimation is used to calculate the slope and intercept of the regression line. These coefficients are then used to predict the value of the dependent variable based on the value of the independent variable.

3. What is the purpose of least-squares estimation?

The purpose of least-squares estimation is to find the line or curve that best represents the relationship between two variables in a linear regression model. It allows us to make predictions and draw conclusions about the relationship between the variables based on the data.

4. What is the difference between simple and multiple linear regression?

Simple linear regression involves one independent variable and one dependent variable, while multiple linear regression involves two or more independent variables and one dependent variable. Least-squares estimation is used in both types of regression to find the best-fit line or curve for the data.

5. How do you interpret the results of least-squares estimation?

The results of least-squares estimation include the regression coefficients, which represent the slope and intercept of the regression line, as well as the coefficient of determination (R-squared) which indicates the proportion of variation in the dependent variable that can be explained by the independent variable(s). These results can be used to make predictions, assess the significance of the relationship between the variables, and evaluate the overall fit of the model to the data.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
18
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
831
Replies
0
Views
355
  • Set Theory, Logic, Probability, Statistics
Replies
9
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
8
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
886
  • Set Theory, Logic, Probability, Statistics
Replies
23
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
9
Views
2K
  • Calculus and Beyond Homework Help
Replies
1
Views
216
  • Introductory Physics Homework Help
Replies
28
Views
365
Back
Top