Nonlinear Least Squares or OLS for Nonlinear Models?

  • Context: Undergrad 
  • Thread starter Thread starter fog37
  • Start date Start date
  • Tags Tags
    Ols
Click For Summary

Discussion Overview

The discussion revolves around the use of ordinary least squares (OLS) versus nonlinear least squares for estimating coefficients in statistical models. Participants explore the conditions under which OLS is applicable, particularly in the context of nonlinear models, and the implications of model assumptions on statistical inference.

Discussion Character

  • Debate/contested
  • Technical explanation
  • Conceptual clarification

Main Points Raised

  • Some participants explain that OLS minimizes the sum of squared errors between observed and estimated values, questioning why it is often stated that OLS is only for linear models.
  • Others argue that while OLS can be applied to nonlinear models, it may not yield the best estimates and that nonlinear least squares is a more appropriate method for such cases.
  • A participant suggests that the choice of estimation method should depend on the purpose of the model and the assumptions that can be met.
  • There is a discussion about the validity of statistical results, such as confidence intervals and p-values, when model assumptions are not satisfied, particularly in relation to the distribution of the response variable and residuals.
  • Some participants clarify that the normality of residuals does not necessarily imply that the observed response variable is normally distributed, emphasizing the importance of understanding the role of the random term in the model.
  • One participant presents examples of models to illustrate the relationship between the response variable, predictors, and the random term.
  • There is a debate about the interpretation of residuals and errors, with some suggesting that they should not be viewed simply as normal random variables.

Areas of Agreement / Disagreement

Participants express differing views on the applicability of OLS to nonlinear models and the implications of model assumptions on statistical inference. No consensus is reached regarding the best approach for nonlinear models or the interpretation of residuals.

Contextual Notes

Participants highlight the importance of model assumptions and their impact on the validity of statistical results, noting that certain assumptions may not be met in practice, which can affect the reliability of confidence intervals and hypothesis testing.

fog37
Messages
1,566
Reaction score
108
TL;DR
Difference between nonlinear least squares vs ordinary least squares
hello,

I understand that the method of ordinary least squares (OLS) is about finding the coefficients that minimize the sum ##\Sigma (y_{observed} -g(X))^2## where ##g(X)## is the statistical model chosen to fit the data. Beside OLS, there clearly other coefficient estimation methods (MLE, etc.)

In general, OLS is fair game when the model ##g(X)## is "linear with respect with the parameters" (linear regression, polynomial regression, etc.): any model that is the sum of several terms with each term being the product of the estimated coefficient and whatever variable: ##g(X) =\Sigma \beta f(X)## where ##f(X)## are like the basis functions. For example, ##g(X)=\beta_0 +\beta_1 X+\beta_2 X^2## is linear and the basis functions are the three functions ##1, X, X^2##...

Of course, the OLS approach is valid as long as specific assumptions on the residuals are met. Additionally, after taking the first derivative and setting them to zero, we are able to arrive to nice analytical formulas for the coefficients.

That said, what is the issue with using OLS when ##g(X)## is a nonlinear model? I know that sometimes we "convert" a nonlinear model so that it assume the form of a linear model. That strategy then allows us to use OLS on the new model based on the transformed variables...That is a useful hack.

But I have reading about "nonlinear least squares". Isn't it the same approach as OLS but when the model is nonlinear where we directly plug the nonlinear model ##g(X)## in ##\Sigma (y_{observed} -g(X))^2## ? We may not end up with analytical estimators and have to solve for the coefficients using some numerical method...But I don't see an issue apply OLS to nonlinear models...

Thank you.
 
  • Like
Likes   Reactions: Dale
Physics news on Phys.org
OLS minimized the sum of squared errors of the actual samples versus the estimated values. If that is your goal, then that is the thing to do.
 
  • Like
Likes   Reactions: fog37
FactChecker said:
OLS minimized the sum of squared errors of the actual samples versus the estimated values. If that is your goal, then that is the thing to do.
That is the goal but many resources I read state that OLS is only for linear models and that puzzled me....Is it because the estimates resulting from applying OLS to linear models are not as good as they could be when the model is linear?
 
I should probably not have called it OLS. If your goal is to minimize the sum-squared-errors, then do that, whether it requires OLS or a numerical technique.
These problems do not exist in a vacuum. You should have a reason for the model you propose and have something that you want to use the results for. That should determine what approach you can use. What you need to be aware of is that the statistical results like confidence intervals of the parameters may not be valid if certain assumptions are not met.
 
  • Like
Likes   Reactions: fog37
FactChecker said:
I should probably not have called it OLS. If your goal is to minimize the sum-squared-errors, then do that, whether it requires OLS or a numerical technique.
These problems do not exist in a vacuum. You should have a reason for the model you propose and have something that you want to use the results for. That should determine what approach you can use. What you need to be aware of is that the statistical results like confidence intervals of the parameters may not be valid if certain assumptions are not met.
I see.

Inferential statistics is either about estimation, hypothesis testing, or both. Estimation is really just about coming up with a reasonably good numerical, unbiased, consistent, low variance estimate of the parameter.

Hypothesis testing focuses on a different task: it hypothesizes the unknown population parameter and uses the limited sample data to check if that hypothesis (H0) is valid or not. Confidence intervals, standard errors, p-values result from hypothesis testing, not from estimation, correct?

If the required assumptions are not met by the chosen model, estimation may still work just fine...but confidence intervals, standard errors, p-values, etc. will not be reliable, statistically speaking.

For example, in linear regression, the response variable ##Y## does not have to be normally distributed for the model to be sound and get good estimates of the slope and intercept. The Markov-Gauss assumptions don't force ##Y## or the residuals to have normal distribution at all....But confidence intervals, standard errors, p-values, the output of hypothesis testing, will not be good if Y in not normal which implies that the residuals will also not be normally distributed...

Am I thinking correctly here?
 
fog37 said:
For example, in linear regression, the response variable ##Y## does not have to be normally distributed for the model to be sound and get good estimates of the slope and intercept. The Markov-Gauss assumptions don't force ##Y## or the residuals to have normal distribution at all....But confidence intervals, standard errors, p-values, the output of hypothesis testing, will not be good if Y in not normal which implies that the residuals will also not be normally distributed...

Am I thinking correctly here?
When you talk about a normal distribution, you should be talking about the random term, ##\epsilon##, not about ##Y##. There can be many ways that random behavior influences ##Y##. I have not seen you mention that yet. You need to pay special attention to how the random term enters into the equation. Without that, your model is incomplete.
Some example models are:
##Y = a_0 + a_1 X_1 + a_2 X_2 + \epsilon##
or
##Y = \epsilon \cdot e^{a_0 + a_1 X_1 + a_2 X_2}##
or
##Y = g( X + \epsilon)##
 
  • Like
Likes   Reactions: fog37
I see. Your point is that the residuals can be normally distributed (and have equal variance) at each ##X## value...But that does not automatically imply that the observed response variable ##Y## has also normally distributed values....

However, I have always thought that if the error is normal, then ##Y## is also normally distributed...
 
fog37 said:
I see. Your point is that the residuals can be normally distributed (and have equal variance) at each ##X## value...But that does not automatically imply that the observed response variable ##Y## has also normally distributed values....

However, I have always thought that if the error is normal, then ##Y## is also normally distributed...
IMO, we shouldn't talk about "residuals" and "error" as though they are a simple normal random variable with a mean of 0. They are the errors of an estimated model versus the true model and can be changed by other errors in the estimated model.
Suppose we have an actual physical relationship ##Y = a_0 + a_1 X + \epsilon##, where ##\epsilon## is a normal variable with a mean of zero, and estimate it with a linear equation ##\hat Y = \hat {a_0} + \hat {a_1} X##.
Then the errors or residuals are ##\hat {\epsilon_i} = y_i - \hat {y_i} = (a_0 - \hat {a_0}) + (a_1 - \hat {a_1})x_i +\epsilon_i##
##\hat {\epsilon_i}## is different from the term ##\epsilon_i##. It includes a term that depends on ##x_i##
 

Similar threads

Replies
3
Views
3K
  • · Replies 23 ·
Replies
23
Views
4K
  • · Replies 6 ·
Replies
6
Views
3K
  • · Replies 22 ·
Replies
22
Views
4K
  • · Replies 7 ·
Replies
7
Views
3K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 30 ·
2
Replies
30
Views
4K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 8 ·
Replies
8
Views
3K
  • · Replies 13 ·
Replies
13
Views
4K