Nonlinear regression which can be partially reduced to linear regression

In summary, the problem is that I encountered several times the following problem: Say I have a variable y dependent in a nonlinear way on m parameters ##\{x_i\}##, with ##i \in \{1,m\}##. However there is a linear relation between n>m functions ##f_j\in{x_i}##, i.e., ##y=\sum_j z_j f_j##. So I can get a solution of my problem determining first the coefficients ##f_j## by linear regression of y on ##z_j##, an then solving m of the n ##f_j## for the m ##x_i##.
  • #1
DrDu
Science Advisor
6,375
980
I encountered several times the following problem: Say I have a variable y dependent in a nonlinear way on m parameters ##\{x_i\}##, with ##i \in \{1,m\}##. However there is a linear relation between n>m functions ##f_j\in{x_i}##, i.e., ##y=\sum_j z_j f_j##. So I can get a solution of my problem determining first the coefficients ##f_j## by linear regression of y on ##z_j##, an then solving m of the n ##f_j## for the m ##x_i##.
Clearly, in the regression, I am not using some of the information about the correlation of the ##f_j## which I have in principle. Can the additional variance introduced by this procedure be estimated? Maybe you know about some paper on that topic?
 
Last edited:
Physics news on Phys.org
  • #2
I tried to find a minimal example. I think my notation was somewhat confusing, as x is rather used for independent variables than for parameters.
So, say my dependent variable is z and my independent variables are x and y.
I can write an expression
z=ax+by.
Estimates for the parameters a and b can be obtained by least squares regression.
Now I learn that a=cos(q) and b=sin(q).
I could set up a nonlinear regression to find q.
But I can also determine it as ##\hat{q}=\arccos(\hat{a})## or ##\arcsin(\hat{b})## or ##\arctan(\hat{b}/\hat{a})## using the estimates from linear regression.
Clearly, I need one measurement more with the linear regression approach, and there may be problems when a or b are near 1 or -1.
What else can be said?
I suppose there is a preferred estimator for q which is unbiased.
 
  • #3
DrDu said:
So, say my dependent variable is z and my independent variables are x and y.
I can write an expression
z=ax+by.

To analyze the problem, we would have to specify how the "errors" or "deviations" arise - i.e. We need a model to explain why the data ##(z_i, x_i, y_i)## doesn't exactly satisfy ##z_i = ax_i + by_i##.

A typical assumption for linear regression would be ##z_i = ax_i + by_i + \eta_i## , where ##\eta_i## is a realization from a mean zero normally distributed random variable. However, if there are also "errors" in measuring ##x_i, y_i## we would have to account for them also. ( Incorporating random variables may put additional unknown parameters into the model - e.g. the standard deviation of ##\eta##.)

Estimates for the parameters a and b can be obtained by least squares regression.
Now I learn that a=cos(q) and b=sin(q).
I could set up a nonlinear regression to find q.
But I can also determine it as ##\hat{q}=\arccos(\hat{a})## or ##\arcsin(\hat{b})## or ##\arctan(\hat{b}/\hat{a})## using the estimates from linear regression.

I suppose there is a preferred estimator for q which is unbiased.

Bottom line questions might concern the behavior of an estimate of what the model predicts instead the behavior of estimates for the parameters used in making the prediction. For example, our goal might be an unbiased estimate of the "true" values of ##z\ ## (i.e. the "noise free" value ## z= a x + b y = \sin(q) x + \cos(q) y )\ ##). Of course, saying precisely what we mean by an "unbiased estimator of ##z##" would require clarification - do we take it to mean that ##\hat{z} = \sin(\hat{q}) x + cos(\hat{q}) y ## is unbiased estimate for ##z## for each possible ##(x,y)## over the whole range of possible values for ##(x,y)## ?

A well known and inconvenient fact about expected values is that ##E(f(\hat{w}))## need not be ##f(E(\hat{w}))##. An unbiased estimator ##\hat{w}## for ##w## doesn't necessarily imply that ##f(\hat{w})## is an unbiased estimator for ##f(w)##. So if we rewrite the model ##z_i = \sin(q) x+i + \cos(q) y+i + \eta_i ## as ##z_i = \sin( f(w)) x_i + \cos(f(w)) y_i + \eta_i## then is it better to focus our attention on finding an unbiased estimator of ##w## or on finding an unbiased estimator of ##f(w)## ? Questions like that suggest (to me) that it clearer to evaluate estimation techniques by their effect on estimating ##z## rather than their effect on estimating the parameters of the model.

If we do want to focus upon the behavior of parameter estimators, I think the general pattern for analyzing the statistics ( mean, variance, etc.) of a parameter estimation algorithm is the following:

1) Try to express the result of the algorithm as an equation that gives the estimate for the parameter as an explicit function of the data - e.g. ##\hat{a} = f( x_1,y_1,z_1, x_2,y_2,z_2,... x_n,y_n,z_n) ##

We hope this function can be expressed a function of only certain summary statistics of the data - e.g. ##\hat{a} = f(\sum{x_i}, \sum{y_i},\sum{z_i}, \sum{x_i^2}, \sum {y_i^2}, \sum {z_i^2}, \sum {x_i y_i},,...) ##.

2) Using the result of step 1) to express the estimated parameter as a function of the values of known sample data together with the the (unknown) "true" values of the parameters and the realizations of the random variables in the model - e.g. ##\hat{a} = f(x_1, y_1, a x_1+ b y_1 + \eta_1, x_2, y_2, a x_2+ b y_2+\eta_2,...) ##. The result of this step expresses ##\hat{a}## as a random variable because it is a function of other random variables.

3) Analyze the mean and variance of the estimator by using techniques that deduce the properties of "functions of random variables" from the properties of the random variables in the arguments of the function.

I think many software packages that purport to output the variance or confidence intervals for the parameters used in a curve fit resort to linear approximations to do step 3. If we approximate ##\hat{a} = f(...)## as a function that is linear in the random variables that occur in its arguments then the variance of ##\hat{a} ## can be approximated from the variance of the random variables in the arguments. To get a numerical answer, we may also need to assume that the numerical value of ##\hat{a_0}## for our particular data is close enough to the unknown true value ##a## so that a linear approximation ##f(...a..) = g(...a,..) + k(...a,...) \eta ## can be approximated by ## g(...\hat{a_0},...) + k(...\hat{a_0},...) \eta##.

The jargon for those assumptions is "asymptotic linear approximation". The word "asymptotic" alludes to assuming we can use ##\hat{a_0}## in place of ##a## and ##\hat{b_0}## in place of ##b## etc.

Another way to get approximate statistics is to keep the "asymptotic" assumption (that the estimated values of the parameters from our particular data can be used in place of the true values of the parameters) and use Monte-Carlo simulation to estimate the behavior of ##f(...a.., \eta)##. Documentation for some software packages mentions using Monte Carlo simulations in connection with computing the statistics of parameter estimators. I think that's what they do.
 
  • #4
Dear Stephen,
thank you for your reply. I think what you are proposing in step 1 is called "Method of Moments".
I now rather tend to think of it the following way: If I take the Likelihood to depend on the linear parameters a and b, then the equations a=sin q and b= cos q are equivalent to the additional constraint ##a^2+b^2=1## which I can take care off e.g. using Lagrage multipliers. So I could take the unconstrained solution as a starting point for further optimization.
 
  • #5
Ok, maybe this works: I can extend the parametrisation of my model, e.g. a=r sin q, b=r cos q. Then, if I use linear regression, I am basically estimating additional parameters which I already know. I.e., I am wasting some degrees of freedom. Asymptotically, this becomes irrelevant.
 
  • #6
DrDu said:
I think what you are proposing in step 1 is called "Method of Moments".

I don't think of it that way. For example, in a typical least squares linear regression to fit y = Ax + B to some data, we have:

##\hat{A} = \frac{ (1/n) \sum (x_i y_i) - (1/n)(\sum x_i) (1/n)( \sum y_i)} { (1/n)\sum x_i^2 - (1/n)(\sum x_i) (1/n) (\sum x_i)} ##

That expression involves quantities that could be interpreted as moments of distributions, but I don't think of the parameter ##A## as being a moment of a distribution.
I now rather tend to think of it the following way: If I take the Likelihood to depend on the linear parameters a and b, then the equations a=sin q and b= cos q are equivalent to the additional constraint ##a^2+b^2=1## which I can take care off e.g. using Lagrage multipliers. So I could take the unconstrained solution as a starting point for further optimization.

You formulate the problem as finding parameters that maximize the liklihood of the data. That's one way to fit a model to data. It doesn't directly address the question of whether associated estimators are unbiased or whether they are "best" from the point of view from minimizing least squares error. Of course, asymptotically, we hope all the different criteria for the "best" parameters will imply approximately the same answer.

Ok, maybe this works: I can extend the parametrisation of my model, e.g. a=r sin q, b=r cos q. Then, if I use linear regression, I am basically estimating additional parameters which I already know. I.e., I am wasting some degrees of freedom. Asymptotically, this becomes irrelevant.

To analyze the model, you need to explicitly represent the random variables that account for the "errors" or "deviations" of the data from the deterministic part of the model. (e.g. the distinction between a model that needs "least squares regression" vs the model that needs "total least squares regression" https://en.wikipedia.org/wiki/Total_least_squares )
 
  • #7
Hey DrDu.

If you are restricting the range so that the random variable you are estimating [the dependent variable] is not on the whole real line then you will need to use the GLM's to estimate the parameters.

If you can put things into an exponential family of distributions then the standard GLM techniques will suffice.

The only other thing will involve Expectation Maximization techniques [the general one] if they don't.

You can find this stuff in an applied regression analysis textbook for graduate statistics.
 
  • #8
Thank you both, this are very good suggestions to think about, which is what I'll do over the holidays!
Have a nice Christmas season!
 

FAQ: Nonlinear regression which can be partially reduced to linear regression

1. What is nonlinear regression?

Nonlinear regression is a statistical method used to model and analyze data that does not follow a linear relationship. It involves fitting a nonlinear function to the data in order to make predictions or draw conclusions.

2. How is nonlinear regression different from linear regression?

Linear regression assumes a linear relationship between the independent and dependent variables, while nonlinear regression allows for more complex relationships that may not be linear. Nonlinear regression also involves estimating parameters for the nonlinear function, whereas linear regression has a closed-form solution.

3. Can nonlinear regression be partially reduced to linear regression?

Yes, in some cases, a nonlinear regression problem can be simplified and partially reduced to a linear regression problem. This can be achieved by transforming the data or the model in some way, such as taking the logarithm of the data or fitting a linear model to a transformed version of the data.

4. What are the advantages of using nonlinear regression?

Nonlinear regression allows for a more flexible and accurate representation of complex relationships in the data. It can also handle a wider range of data types and can be used to make predictions beyond the range of the observed data.

5. What are the limitations of using nonlinear regression?

Nonlinear regression can be more computationally intensive and may require more data points compared to linear regression. It also relies on the choice of an appropriate nonlinear function, which can be challenging and may result in biased or unreliable results if not chosen carefully.

Back
Top