# A Nonlinear regression which can be partially reduced to linear regression

1. Dec 11, 2016

### DrDu

I encountered several times the following problem: Say I have a variable y dependent in a nonlinear way on m parameters $\{x_i\}$, with $i \in \{1,m\}$. However there is a linear relation between n>m functions $f_j\in{x_i}$, i.e., $y=\sum_j z_j f_j$. So I can get a solution of my problem determining first the coefficients $f_j$ by linear regression of y on $z_j$, an then solving m of the n $f_j$ for the m $x_i$.
Clearly, in the regression, I am not using some of the information about the correlation of the $f_j$ which I have in principle. Can the additional variance introduced by this procedure be estimated? Maybe you know about some paper on that topic?

Last edited: Dec 11, 2016
2. Dec 12, 2016

### DrDu

I tried to find a minimal example. I think my notation was somewhat confusing, as x is rather used for independent variables than for parameters.
So, say my dependent variable is z and my independent variables are x and y.
I can write an expression
z=ax+by.
Estimates for the parameters a and b can be obtained by least squares regression.
Now I learn that a=cos(q) and b=sin(q).
I could set up a nonlinear regression to find q.
But I can also determine it as $\hat{q}=\arccos(\hat{a})$ or $\arcsin(\hat{b})$ or $\arctan(\hat{b}/\hat{a})$ using the estimates from linear regression.
Clearly, I need one measurement more with the linear regression approach, and there may be problems when a or b are near 1 or -1.
What else can be said?
I suppose there is a preferred estimator for q which is unbiased.

3. Dec 18, 2016

### Stephen Tashi

To analyze the problem, we would have to specify how the "errors" or "deviations" arise - i.e. We need a model to explain why the data $(z_i, x_i, y_i)$ doesn't exactly satisfy $z_i = ax_i + by_i$.

A typical assumption for linear regression would be $z_i = ax_i + by_i + \eta_i$ , where $\eta_i$ is a realization from a mean zero normally distributed random variable. However, if there are also "errors" in measuring $x_i, y_i$ we would have to account for them also. ( Incorporating random variables may put additional unknown parameters into the model - e.g. the standard deviation of $\eta$.)

Bottom line questions might concern the behavior of an estimate of what the model predicts instead the behavior of estimates for the parameters used in making the prediction. For example, our goal might be an unbiased estimate of the "true" values of $z\$ (i.e. the "noise free" value $z= a x + b y = \sin(q) x + \cos(q) y )\$). Of course, saying precisely what we mean by an "unbiased estimator of $z$" would require clarification - do we take it to mean that $\hat{z} = \sin(\hat{q}) x + cos(\hat{q}) y$ is unbiased estimate for $z$ for each possible $(x,y)$ over the whole range of possible values for $(x,y)$ ?

A well known and inconvenient fact about expected values is that $E(f(\hat{w}))$ need not be $f(E(\hat{w}))$. An unbiased estimator $\hat{w}$ for $w$ doesn't necessarily imply that $f(\hat{w})$ is an unbiased estimator for $f(w)$. So if we rewrite the model $z_i = \sin(q) x+i + \cos(q) y+i + \eta_i$ as $z_i = \sin( f(w)) x_i + \cos(f(w)) y_i + \eta_i$ then is it better to focus our attention on finding an unbiased estimator of $w$ or on finding an unbiased estimator of $f(w)$ ? Questions like that suggest (to me) that it clearer to evaluate estimation techniques by their effect on estimating $z$ rather than their effect on estimating the parameters of the model.

If we do want to focus upon the behavior of parameter estimators, I think the general pattern for analyzing the statistics ( mean, variance, etc.) of a parameter estimation algorithm is the following:

1) Try to express the result of the algorithm as an equation that gives the estimate for the parameter as an explicit function of the data - e.g. $\hat{a} = f( x_1,y_1,z_1, x_2,y_2,z_2,.... x_n,y_n,z_n)$

We hope this function can be expressed a function of only certain summary statistics of the data - e.g. $\hat{a} = f(\sum{x_i}, \sum{y_i},\sum{z_i}, \sum{x_i^2}, \sum {y_i^2}, \sum {z_i^2}, \sum {x_i y_i},,....)$.

2) Using the result of step 1) to express the estimated parameter as a function of the values of known sample data together with the the (unknown) "true" values of the parameters and the realizations of the random variables in the model - e.g. $\hat{a} = f(x_1, y_1, a x_1+ b y_1 + \eta_1, x_2, y_2, a x_2+ b y_2+\eta_2,....)$. The result of this step expresses $\hat{a}$ as a random variable because it is a function of other random variables.

3) Analyze the mean and variance of the estimator by using techniques that deduce the properties of "functions of random variables" from the properties of the random variables in the arguments of the function.

I think many software packages that purport to output the variance or confidence intervals for the parameters used in a curve fit resort to linear approximations to do step 3. If we approximate $\hat{a} = f(...)$ as a function that is linear in the random variables that occur in its arguments then the variance of $\hat{a}$ can be approximated from the variance of the random variables in the arguments. To get a numerical answer, we may also need to assume that the numerical value of $\hat{a_0}$ for our particular data is close enough to the unknown true value $a$ so that a linear approximation $f(...a..) = g(...a,..) + k(...a,...) \eta$ can be approximated by $g(...\hat{a_0},...) + k(...\hat{a_0},...) \eta$.

The jargon for those assumptions is "asymptotic linear approximation". The word "asymptotic" alludes to assuming we can use $\hat{a_0}$ in place of $a$ and $\hat{b_0}$ in place of $b$ etc.

Another way to get approximate statistics is to keep the "asymptotic" assumption (that the estimated values of the parameters from our particular data can be used in place of the true values of the parameters) and use Monte-Carlo simulation to estimate the behavior of $f(...a.., \eta)$. Documentation for some software packages mentions using Monte Carlo simulations in connection with computing the statistics of parameter estimators. I think that's what they do.

4. Dec 19, 2016

### DrDu

Dear Stephen,
thank you for your reply. I think what you are proposing in step 1 is called "Method of Moments".
I now rather tend to think of it the following way: If I take the Likelihood to depend on the linear parameters a and b, then the equations a=sin q and b= cos q are equivalent to the additional constraint $a^2+b^2=1$ which I can take care off e.g. using Lagrage multipliers. So I could take the unconstrained solution as a starting point for further optimization.

5. Dec 20, 2016

### DrDu

Ok, maybe this works: I can extend the parametrisation of my model, e.g. a=r sin q, b=r cos q. Then, if I use linear regression, I am basically estimating additional parameters which I already know. I.e., I am wasting some degrees of freedom. Asymptotically, this becomes irrelevant.

6. Dec 20, 2016

### Stephen Tashi

I don't think of it that way. For example, in a typical least squares linear regression to fit y = Ax + B to some data, we have:

$\hat{A} = \frac{ (1/n) \sum (x_i y_i) - (1/n)(\sum x_i) (1/n)( \sum y_i)} { (1/n)\sum x_i^2 - (1/n)(\sum x_i) (1/n) (\sum x_i)}$

That expression involves quantities that could be interpreted as moments of distributions, but I don't think of the parameter $A$ as being a moment of a distribution.

You formulate the problem as finding parameters that maximize the liklihood of the data. That's one way to fit a model to data. It doesn't directly address the question of whether associated estimators are unbiased or whether they are "best" from the point of view from minimizing least squares error. Of course, asymptotically, we hope all the different criteria for the "best" parameters will imply approximately the same answer.

To analyze the model, you need to explicitly represent the random variables that account for the "errors" or "deviations" of the data from the deterministic part of the model. (e.g. the distinction between a model that needs "least squares regression" vs the model that needs "total least squares regression" https://en.wikipedia.org/wiki/Total_least_squares )

7. Dec 21, 2016

### chiro

Hey DrDu.

If you are restricting the range so that the random variable you are estimating [the dependent variable] is not on the whole real line then you will need to use the GLM's to estimate the parameters.

If you can put things into an exponential family of distributions then the standard GLM techniques will suffice.

The only other thing will involve Expectation Maximization techniques [the general one] if they don't.

You can find this stuff in an applied regression analysis textbook for graduate statistics.

8. Dec 22, 2016

### DrDu

Thank you both, this are very good suggestions to think about, which is what I'll do over the holidays!
Have a nice Christmas season!