Nonlinear Least Squares or OLS for Nonlinear Models?

fog37 · Jan 12, 2024

hello,

I understand that the method of ordinary least squares (OLS) is about finding the coefficients that minimize the sum ##\Sigma (y_{observed} -g(X))^2## where ##g(X)## is the statistical model chosen to fit the data. Beside OLS, there clearly other coefficient estimation methods (MLE, etc.)

In general, OLS is fair game when the model ##g(X)## is "linear with respect with the parameters" (linear regression, polynomial regression, etc.): any model that is the sum of several terms with each term being the product of the estimated coefficient and whatever variable: ##g(X) =\Sigma \beta f(X)## where ##f(X)## are like the basis functions. For example, ##g(X)=\beta_0 +\beta_1 X+\beta_2 X^2## is linear and the basis functions are the three functions ##1, X, X^2##...

Of course, the OLS approach is valid as long as specific assumptions on the residuals are met. Additionally, after taking the first derivative and setting them to zero, we are able to arrive to nice analytical formulas for the coefficients.

That said, what is the issue with using OLS when ##g(X)## is a nonlinear model? I know that sometimes we "convert" a nonlinear model so that it assume the form of a linear model. That strategy then allows us to use OLS on the new model based on the transformed variables...That is a useful hack.

But I have reading about "nonlinear least squares". Isn't it the same approach as OLS but when the model is nonlinear where we directly plug the nonlinear model ##g(X)## in ##\Sigma (y_{observed} -g(X))^2## ? We may not end up with analytical estimators and have to solve for the coefficients using some numerical method...But I don't see an issue apply OLS to nonlinear models...

Thank you.

FactChecker · Jan 12, 2024

OLS minimized the sum of squared errors of the actual samples versus the estimated values. If that is your goal, then that is the thing to do.

fog37 · Jan 12, 2024

FactChecker said:

OLS minimized the sum of squared errors of the actual samples versus the estimated values. If that is your goal, then that is the thing to do.

That is the goal but many resources I read state that OLS is only for linear models and that puzzled me....Is it because the estimates resulting from applying OLS to linear models are not as good as they could be when the model is linear?

FactChecker · Jan 12, 2024

I should probably not have called it OLS. If your goal is to minimize the sum-squared-errors, then do that, whether it requires OLS or a numerical technique.
These problems do not exist in a vacuum. You should have a reason for the model you propose and have something that you want to use the results for. That should determine what approach you can use. What you need to be aware of is that the statistical results like confidence intervals of the parameters may not be valid if certain assumptions are not met.

fog37 · Jan 13, 2024

FactChecker said:

I should probably not have called it OLS. If your goal is to minimize the sum-squared-errors, then do that, whether it requires OLS or a numerical technique.
These problems do not exist in a vacuum. You should have a reason for the model you propose and have something that you want to use the results for. That should determine what approach you can use. What you need to be aware of is that the statistical results like confidence intervals of the parameters may not be valid if certain assumptions are not met.

I see.

Inferential statistics is either about estimation, hypothesis testing, or both. Estimation is really just about coming up with a reasonably good numerical, unbiased, consistent, low variance estimate of the parameter.

Hypothesis testing focuses on a different task: it hypothesizes the unknown population parameter and uses the limited sample data to check if that hypothesis (H0) is valid or not. Confidence intervals, standard errors, p-values result from hypothesis testing, not from estimation, correct?

If the required assumptions are not met by the chosen model, estimation may still work just fine...but confidence intervals, standard errors, p-values, etc. will not be reliable, statistically speaking.

For example, in linear regression, the response variable ##Y## does not have to be normally distributed for the model to be sound and get good estimates of the slope and intercept. The Markov-Gauss assumptions don't force ##Y## or the residuals to have normal distribution at all....But confidence intervals, standard errors, p-values, the output of hypothesis testing, will not be good if Y in not normal which implies that the residuals will also not be normally distributed...

Am I thinking correctly here?

FactChecker · Jan 13, 2024

fog37 said:

For example, in linear regression, the response variable ##Y## does not have to be normally distributed for the model to be sound and get good estimates of the slope and intercept. The Markov-Gauss assumptions don't force ##Y## or the residuals to have normal distribution at all....But confidence intervals, standard errors, p-values, the output of hypothesis testing, will not be good if Y in not normal which implies that the residuals will also not be normally distributed...

Am I thinking correctly here?

When you talk about a normal distribution, you should be talking about the random term, ##\epsilon##, not about ##Y##. There can be many ways that random behavior influences ##Y##. I have not seen you mention that yet. You need to pay special attention to how the random term enters into the equation. Without that, your model is incomplete.
Some example models are:
##Y = a_0 + a_1 X_1 + a_2 X_2 + \epsilon##
or
##Y = \epsilon \cdot e^{a_0 + a_1 X_1 + a_2 X_2}##
or
##Y = g( X + \epsilon)##

fog37 · Jan 13, 2024

I see. Your point is that the residuals can be normally distributed (and have equal variance) at each ##X## value...But that does not automatically imply that the observed response variable ##Y## has also normally distributed values....

However, I have always thought that if the error is normal, then ##Y## is also normally distributed...

FactChecker · Jan 13, 2024

fog37 said:

I see. Your point is that the residuals can be normally distributed (and have equal variance) at each ##X## value...But that does not automatically imply that the observed response variable ##Y## has also normally distributed values....

However, I have always thought that if the error is normal, then ##Y## is also normally distributed...

IMO, we shouldn't talk about "residuals" and "error" as though they are a simple normal random variable with a mean of 0. They are the errors of an estimated model versus the true model and can be changed by other errors in the estimated model.
Suppose we have an actual physical relationship ##Y = a_0 + a_1 X + \epsilon##, where ##\epsilon## is a normal variable with a mean of zero, and estimate it with a linear equation ##\hat Y = \hat {a_0} + \hat {a_1} X##.
Then the errors or residuals are ##\hat {\epsilon_i} = y_i - \hat {y_i} = (a_0 - \hat {a_0}) + (a_1 - \hat {a_1})x_i +\epsilon_i##
##\hat {\epsilon_i}## is different from the term ##\epsilon_i##. It includes a term that depends on ##x_i##

Nonlinear Least Squares or OLS for Nonlinear Models?

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Similar threads

Undergrad Please Explain (actually explain) The Monty Hall Problem

Undergrad A variant of the Monty Hall problem

Graduate Hypothesis testing: Defining H0, HA hypotheses so that ( H_A)_A' makes sense

Undergrad My basic understanding of set theory

Undergrad How do E[X] and E[|X|] relate?

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight