DrDu said:
So, say my dependent variable is z and my independent variables are x and y.
I can write an expression
z=ax+by.
To analyze the problem, we would have to specify how the "errors" or "deviations" arise - i.e. We need a model to explain why the data ##(z_i, x_i, y_i)## doesn't exactly satisfy ##z_i = ax_i + by_i##.
A typical assumption for linear regression would be ##z_i = ax_i + by_i + \eta_i## , where ##\eta_i## is a realization from a mean zero normally distributed random variable. However, if there are also "errors" in measuring ##x_i, y_i## we would have to account for them also. ( Incorporating random variables may put additional unknown parameters into the model - e.g. the standard deviation of ##\eta##.)
Estimates for the parameters a and b can be obtained by least squares regression.
Now I learn that a=cos(q) and b=sin(q).
I could set up a nonlinear regression to find q.
But I can also determine it as ##\hat{q}=\arccos(\hat{a})## or ##\arcsin(\hat{b})## or ##\arctan(\hat{b}/\hat{a})## using the estimates from linear regression.
I suppose there is a preferred estimator for q which is unbiased.
Bottom line questions might concern the behavior of an estimate of what the model predicts instead the behavior of estimates for the parameters used in making the prediction. For example, our goal might be an unbiased estimate of the "true" values of ##z\ ## (i.e. the "noise free" value ## z= a x + b y = \sin(q) x + \cos(q) y )\ ##). Of course, saying precisely what we mean by an "unbiased estimator of ##z##" would require clarification - do we take it to mean that ##\hat{z} = \sin(\hat{q}) x + cos(\hat{q}) y ## is unbiased estimate for ##z## for each possible ##(x,y)## over the whole range of possible values for ##(x,y)## ?
A well known and inconvenient fact about expected values is that ##E(f(\hat{w}))## need not be ##f(E(\hat{w}))##. An unbiased estimator ##\hat{w}## for ##w## doesn't necessarily imply that ##f(\hat{w})## is an unbiased estimator for ##f(w)##. So if we rewrite the model ##z_i = \sin(q) x+i + \cos(q) y+i + \eta_i ## as ##z_i = \sin( f(w)) x_i + \cos(f(w)) y_i + \eta_i## then is it better to focus our attention on finding an unbiased estimator of ##w## or on finding an unbiased estimator of ##f(w)## ? Questions like that suggest (to me) that it clearer to evaluate estimation techniques by their effect on estimating ##z## rather than their effect on estimating the parameters of the model.
If we do want to focus upon the behavior of parameter estimators, I think the general pattern for analyzing the statistics ( mean, variance, etc.) of a parameter estimation algorithm is the following:
1) Try to express the result of the algorithm as an equation that gives the estimate for the parameter as an explicit function of the data - e.g. ##\hat{a} = f( x_1,y_1,z_1, x_2,y_2,z_2,... x_n,y_n,z_n) ##
We hope this function can be expressed a function of only certain summary statistics of the data - e.g. ##\hat{a} = f(\sum{x_i}, \sum{y_i},\sum{z_i}, \sum{x_i^2}, \sum {y_i^2}, \sum {z_i^2}, \sum {x_i y_i},,...) ##.
2) Using the result of step 1) to express the estimated parameter as a function of the values of known sample data together with the the (unknown) "true" values of the parameters and the realizations of the random variables in the model - e.g. ##\hat{a} = f(x_1, y_1, a x_1+ b y_1 + \eta_1, x_2, y_2, a x_2+ b y_2+\eta_2,...) ##. The result of this step expresses ##\hat{a}## as a random variable because it is a function of other random variables.
3) Analyze the mean and variance of the estimator by using techniques that deduce the properties of "functions of random variables" from the properties of the random variables in the arguments of the function.
I think many software packages that purport to output the variance or confidence intervals for the parameters used in a curve fit resort to linear approximations to do step 3. If we approximate ##\hat{a} = f(...)## as a function that is linear in the random variables that occur in its arguments then the variance of ##\hat{a} ## can be approximated from the variance of the random variables in the arguments. To get a numerical answer, we may also need to assume that the numerical value of ##\hat{a_0}## for our particular data is close enough to the unknown true value ##a## so that a linear approximation ##f(...a..) = g(...a,..) + k(...a,...) \eta ## can be approximated by ## g(...\hat{a_0},...) + k(...\hat{a_0},...) \eta##.
The jargon for those assumptions is "asymptotic linear approximation". The word "asymptotic" alludes to assuming we can use ##\hat{a_0}## in place of ##a## and ##\hat{b_0}## in place of ##b## etc.
Another way to get approximate statistics is to keep the "asymptotic" assumption (that the estimated values of the parameters from our particular data can be used in place of the true values of the parameters) and use Monte-Carlo simulation to estimate the behavior of ##f(...a.., \eta)##. Documentation for some software packages mentions using Monte Carlo simulations in connection with computing the statistics of parameter estimators. I think that's what they do.