Jeffack said:
I'm going to simplify everything here by using an additive error term (rather than multiplying by e^\eta). For the simple two-stores-in-a-zip example, the actual data-generating-process is:
<br />
R_{1}=\eta_{1}+r_{1}<br />
R_{2}=\eta_{2}+r_{2}<br />
I understand that model as the two equations:
##R_{1}=\eta_{1}+r_{1}##
##R_{2}=\eta_{2}+r_{2}##
If I observed both store's revenues, I would be looking to minimize
\eta_{1}^{2}+\eta_{2}^{2}=(R_{1}-r_{1})^{2}+(R_{2}-r_{2})^{2}
That's not good terminolgy because it doesn't mention anything about a sum of squares or anything about data.
We need to use terminology that distinguishes among random variables, the expectations of random variables, samples of random variables, and means of samples of random variables. You also need to distinguish between population parameters (e.g. expected values) and estimators of population parameters (e.g. the formula for finding a sample mean is an
estimator of the population mean).
However, I don't observe both revenues. Instead, my goal is to minimize the expectation of these, when all I observe is <br />
(\eta_{1}+\eta_{2})=R_{1}+R_{2}-r_{1}-r_{2}
That's the abstract goal of least squares fitting, but least squares fitting is usually done to data, not to theoretical distributions. If you knew both the theoretical distribution of the actual phenomena as well as the your theoretical family of models, you would know the distribution of the squares of errors. Then it would make sense to say that your goal is to pick a member of the family of models that minimized the expected value of squares of the errors.
The usual situation is that we have data not known to be from a particular theoretical distribution. We assume it is from some family of distributions, but we still don't know exactly which member of the family it is. We don't know the distribution of the squares of the errors, so we can't find the expected value of that distribution.
What we do is pick a formula for an
estimator for the mean square error that is a function both of the data (which are known values) and of the parameters that define a member of our family. Then we find the values of the parameters that minimize the
estimator.
So, I want to compute the following conditional expectation:
\mathrm{E}[\eta_{1}^{2}+\eta_{2}^{2}|(\eta_{1}+\eta_{2})^{2}]
You want to "estimate" it.
It seems as though it should be possible to solve the above if I assume that the \eta's are distributed normally with mean zero and a constant s.d.
You have an interesting idea. It needs to be put in the correct language.
A given population quantity can have several different estimators, each of which is "good" in a different way. Without going into the technicalities of what "good" shall mean in this particular problem, your idea is to find a good estimator for the (population) mean of ##\eta^2_1 + \eta^2_2 ## as a function of the variables ##\sigma_1, \sigma_2## and the given data, which are observations of ##(\eta_1 + \eta_2)##.
Formulas for good estimators are often (but not always) suggested by looking at formulas for the population parameters that they estimate. For the example, the formula for the expected value of a random variable involves an integral. For an estimator of the population mean, we use the formula for the sample mean, which "imitates" the integral by using a finite sum. So I agree that it makes sense to look at the theoretical calculation for ##E( \eta^2_1 + \eta^2_2 | (\eta_1 + \eta_2) )##.