# Sum of random variables, given sum of observed variables

• A
I have a model in which, for each store, predicted revenues are perturbed by a multiplicative shock:

$R = e^\eta r$

where $r$ is predicted and $R$ is observed. $\eta$ is mean zero.

I can find $\eta$ as follows: $\ln( r) - \ln( R) = \eta$. I'm summing the squares of the $\eta$'s.

However, there are some markets where I only observe the sum of sales of multiple stores. The simplest example would be for two stores:
$R_{1}=e^{\eta_{1}}r_{1}$
$R_{2}=e^{\eta_{2}}r_{2}$
where I only observe $R_1 + R_2$. I would like to calculate $\eta_1^2 + \eta_2^2$ (or at least its expected value). Is there any way I can do this? Is there anything at all that I can say about $\eta_1^2 + \eta_2^2$ or $\eta_1 + \eta_2$ ? I'm OK with making assumptions about the distribution of $\eta$ if that's necessary.

Thanks!

jim mcnamara
Mentor
Can you discern from the raw data which category the observables fall into? (summed or not summed)

Yes. If they are summed, I know how many stores are included in the group.

Stephen Tashi
I would like to calculate $\eta_1^2 + \eta_2^2$ (or at least its expected value).
The technically correct language is probably that you would like to "estimate" it , not "calculate" it - unless you are asking how to compute each sampled value of $\eta_1^2 + \eta_2^2$ from the corresponding sampled value of $\eta_1 + \eta_2$. (The importance of this technicality is that if you are doing web searches on on the problem, it is a problem of "statistical estimation" or finding a good "estimator" etc. )

I'm OK with making assumptions about the distribution of $\eta$ if that's necessary.

First, let's see if I visualize your data correctly. I visualize N different "revenue producing areas"
The data for the ##n##_th revenue producing area consists of ##M## samples of the total revenue for stores in that area: ##S_{n,i}, n = 1,2,3,...N, i = 1,2,3,...M## .

For each ##k##_th store in ##n##_th revenue producing area, the model for the store's revenue is ## R_{n,k} = e^{\eta_{n,k}} r_k ##.

The random variables involved are the ##\eta_{n,k}##.

Is the "predicted" ##r_k## a constant value for each sample? Or do you have data with records like:

area = 3, sample = 5, store r_1 = $634.30, store r_2 =$532.23, observed sum of revenues S = $12432.60 area = 3, sample = 6, store r_1 =$ 236.00, store r_2 = $530.00, observed sum of revenues S =$655.92
.....

?

You're correct about the "estimation" vs "calculation" comment.

Regarding your question at the end, I am assuming that $r$ is the same for each store within an area (I use zip codes as my area). In other words, my model spits out a single predicted revenue number for each zip code.

Example:
Data Generating Processes for zip code A, which has 3 stores, and zip code B, which has 2 stores: :
$$R_{A}=e^{\eta_{1,A}}r_{A}+e^{\eta_{2,A}}r_{A}+e^{\eta_{3,A}}r_{A}$$
$$R_{B}=e^{\eta_{1,B}}r_{B}+e^{\eta_{2,B}}r_{B}$$
What I observe for A is: The number of stores (3) and the total revenue for those three stores, $R_A$ . What I observe for B is the number of stores (2) and the total revenue for these stores, $R_A$.

Stephen Tashi
Example:
Data Generating Processes for zip code A, which has 3 stores, and zip code B, which has 2 stores: :
$$R_{A}=e^{\eta_{1,A}}r_{A}+e^{\eta_{2,A}}r_{A}+e^{\eta_{3,A}}r_{A}$$
$$R_{B}=e^{\eta_{1,B}}r_{B}+e^{\eta_{2,B}}r_{B}$$
What I observe for A is: The number of stores (3) and the total revenue for those three stores, $R_A$ . What I observe for B is the number of stores (2) and the total revenue for these stores, $R_A$.

Ok, that's clear.

Now, as to what you wish to estimate - do wish to estimate the distribution of the revenue from each store ?

Is that the eventual purpose of estimating the distribution of ## \eta_{1,B} +\eta_{2,B} ## ? I don't see the direction connection between estimating the distribution of that sum (or its expectation) and the problem of estimating the distribution of revenue for each individual store.

Thanks for the help thus far.

The short answer is that, given observed $R$ and predicted $r$ for each location, I would like to estimate $\eta_{1}^{2}+\eta_{2}^{2}+...+\eta_{N}^{2}$, where $N$ is the number of stores in that location.

The more explanatory answer is that I am trying to construct a model for predicting revenues. I am doing this my minimizing the sum of squared residuals (so, a nonlinear least squares estimation). If I observed each store's revenue, then it would just be a matter of minimizing the sum of $(\ln(R_s )-\ln(r_s))^{2}$, where $s$ is the number of stores . However, I'm concerned that using this method on the data I have (where I have location data but not individual store data) gives each location the same weight, even though they have different numbers of stores (and, therefore, different numbers of realizations of $\eta$). I've tried things like multiplying each location's squared residual by the number of stores in each location, but I wanted to see if there was any way to show mathematically that this is a reasonable thing to do (or, show why the "basic" way is fine)

To get technical... the basic estimation is like this:
$$\hat{\theta}=\underset{\theta}{argmin}\sum_{s}(\ln(R_{s})-\ln(r))^{2}$$
where $r=f(\theta)$

Stephen Tashi
The short answer is that, given observed $R$ and predicted $r$ for each location, I would like to estimate $\eta_{1}^{2}+\eta_{2}^{2}+...+\eta_{N}^{2}$, where $N$ is the number of stores in that location.

The ##\eta_i## are random variables, so their sum is a random variable. For a given sales area (zip code) you presumably have several realizations of the total revenue ( daily sales figures or something like that). Are you seeking to estimate the value of ##\eta^2_{1,k} + \eta^2_{2,k} + ... + \eta^2_{N,k} ## for each day k = 1,2,....M ?

The more explanatory answer is that I am trying to construct a model for predicting revenues. I am doing this my minimizing the sum of squared residuals (so, a nonlinear least squares estimation). If I observed each store's revenue, then it would just be a matter of minimizing the sum of $(\ln(R_s )-\ln(r_s))^{2}$, where $s$ is the number of stores .

If you knew each stores revenue, why wouldn't you fit your model by minimizing the sum of ##(R_{s,k} - r_{s,k})^2## ?

Certain transformations of data make least squares fitting easier, but the least squares fit for the transformed data isn't necessarily the least squares fit for the untransformed data. For example the "error" between $100 and$10 is 1 in terms of ##log_{10}##, as is the error between $1000 and$100.

For a given sales area (zip code) you presumably have several realizations of the total revenue ( daily sales figures or something like that)
Yes, I have panel data, so I have data for multiple zip codes over multiple months.
Are you seeking to estimate the value of η21,k+η22,k+...+η2N,kη1,k2+η2,k2+...+ηN,k2\eta^2_{1,k} + \eta^2_{2,k} + ... + \eta^2_{N,k} for each day k = 1,2,....M ?
Yes, that's what I'm trying to do. I'm summing the error over each month and each location.
If you knew each stores revenue, why wouldn't you fit your model by minimizing the sum of (Rs,k−rs,k)2(Rs,k−rs,k)2(R_{s,k} - r_{s,k})^2 ?
If I knew each store's revenue (which I don't), that's what I would do.

Certain transformations of data make least squares fitting easier, but the least squares fit for the transformed data isn't necessarily the least squares fit for the untransformed data.
Right. I'm currently using the multiplicatve error term for two reasons. First, that's what the paper I'm basing my work off of uses. Second, using additive error makes some of the errors for large zip codes gigantic and essentially swamp the small errors that exist in small zip codes. I'm also running some models that use additive error.

I'm starting to concede that there's not much I can do here, simply because I don't have enough information. As an example, if I have a zip code with two stores and predicted revenue of $5 per store. $$R=5e^{\eta_{1}}+5e^{\eta_{2}}=5(e^{\eta_{1}}+e^{\eta_{2}})$$ If each $\eta$ were equal to its expectation (which is zero), then my actual revenue would equal expected revenue, which is$10. Suppose, instead, that observed revenue is \$100. There are many different values of $\eta_{1}$ and $\eta_{2}$ that would give this. For example:

$\eta_{1}=0$ and $\eta_{2}\thickapprox2.94$
or
$\eta_{1}=\eta_{2}\thickapprox2.3$
or many other combinations.

These will lead to different values of $\eta_{1}^{2}+\eta_{2}^{2}$ (in this example the squared sum will be 8.64 in the first instance and 10.58 in the second instance). I don't know what the actual realizations of $\eta$ will be. I could assume that $\eta_{1}=\eta_{2}$, but I'm not sure that I can do that (since that would mean that the two error terms are perfectly correlated, and everything I've done with least squares assumes i.i.d. error terms; Thinking about this another way, the zip code would really just have a single realization of $\eta$, with that term being used for both $\eta_{1}$ and $\eta_{2}$).

Stephen Tashi
There are many different values of $\eta_{1}$ and $\eta_{2}$ that would give this. For example:

$\eta_{1}=0$ and $\eta_{2}\thickapprox2.94$
or
$\eta_{1}=\eta_{2}\thickapprox2.3$
or many other combinations.

These will lead to different values of $\eta_{1}^{2}+\eta_{2}^{2}$ (in this example the squared sum will be 8.64 in the first instance and 10.58 in the second instance).

I agree with the impossibility of deducing individual values for the realizations of $\eta_1, \eta_2$.

I'm trying to understand your focus on the random variable ##\eta^2_1 + \eta^2_2##. Is the idea that observed values of ##\eta^2_1 + \eta^2_2## would allow you to compare the predicted variance of total sales ##R## with observed variance of total sales ? (You mention least squares, but you haven't given the definition of the predicted variable. You don't have a model predicting the standard deviations for the distributions of the ##\eta_i## do you ? )

I agree with your pessimism about deducing the values of individual realizations of the random variables ##\eta_1, \eta_2## from a datum that only gives their sum. However, the seemingly more ambitious problem of estimating the distributions of random variables from data that only gives their sum is, as the saying goes, a "well known" problem. It's known as a problem of "deconvolution".

When a given distribution can be expressed as the sum of two random variables in more than one way, a deconvolution problem is usually solved by adding some condition on the solution that makes it unique.

I did a brief web search on "deconvolution, lognormals" and found some highly specialized articles where people claim to have deconvolved data into the sum of lognormally distributed random variables, but the specialized contents makes the mathematics obscure - and I'm not sure the mathematics is good. A complication in searching is that many people don't make a distinction between "decomposing" a distribution as "mixture" of distributions and "deconvolving" a distribution into a "convolution" of distributions. The latter task is what you have.

I'm optimistic that we can solve your deconvolution problem. It will require some thinking. If you are writing up your work for an article, you'll have to decide whether the audience will accept some mathematics that is not routinely seen in introductory statistics courses.

I'm trying to understand your focus on the random variable η21+η22η12+η22\eta^2_1 + \eta^2_2.

I'm attempting to minimize the sum of squares. So, if there were 3 stores and I observed them all, I would minimize $\eta_{1}^{2}+\eta_{2}^{2}+\eta_{3}^{2}$
. If, instead, stores 2 and 3 are aggregated together, I think that I need to minimize $\eta_{1}^{2}+\{\mathrm{some\ approximation\ of\ \eta_{2}^{2}+\eta_{3}^{2}}\}$. The trick, of course, is figuring out how to approximate $\eta_{2}^{2}+\eta_{3}^{2}$

- I'm not married to the idea of the multiplicative error term - I would switch to a simple linear error term if that makes things easier. I'm using the [tex ]e^{\eta} [\itex] error term because revenues are always positive.

- I had never heard of deconvolution, but I'm reading up on the subject now.

- This is my job market paper in economics. The more obscure math it contains the better :)

I had an idea. This could be way off base, but here goes...
I'm going to simplify everything here by using an additive error term (rather than multiplying by $e^\eta$). For the simple two-stores-in-a-zip example, the actual data-generating-process is:
$$R_{1}=\eta_{1}+r_{1} R_{2}=\eta_{2}+r_{2}$$
If I observed both store's revenues, I would be looking to minimize
$$\eta_{1}^{2}+\eta_{2}^{2}=(R_{1}-r_{1})^{2}+(R_{2}-r_{2})^{2}$$

However, I don't observe both revenues. Instead, my goal is to minimize the expectation of these, when all I observe is $(\eta_{1}+\eta_{2})=R_{1}+R_{2}-r_{1}-r_{2}$
So, I want to compute the following conditional expectation:
$$\mathrm{E}[\eta_{1}^{2}+\eta_{2}^{2}|(\eta_{1}+\eta_{2})^{2}]$$
It seems as though it should be possible to solve the above if I assume that the $\eta$'s are distributed normally with mean zero and a constant s.d.

Stephen Tashi
I'm attempting to minimize the sum of squares. So, if there were 3 stores and I observed them all, I would minimize $\eta_{1}^{2}+\eta_{2}^{2}+\eta_{3}^{2}$

I don't understand that terminology. The ##\eta_i## are random variables. So what would it mean to "minimize" them? You could minimize some parameter of their distribution. For example, you could set their standard deviations equal to zero.

I think you are talking about observed values instead of random variables. The usual scenario for least squares is that we pick parameters for our prediction model (or distribution or whatever) that minimize the least squares error between a predicted value and an observed value. I don't see that a particular k_th observation ##\eta^2_{1,k} + \eta^2_{2,k} + \eta^2_{3,k}## is a difference between a predicted value of something and the observed value of the thing.

As I understand your model, least squares estimate of the observed value of the random variable ##\eta_1 + \eta_2 + \eta_3## would be zero because the ##\eta_i## each have mean zero. The mean value of a random variable is the best least squares predictor of its realized values. Are you trying to find the best squares estimator of ##\eta^2_{1,k} + \eta^2_{2,k} + \eta^2_{3,k}## ? I don't see a direct relation between that estimator and finding the best fit of a revenue model to revenue data.

Stephen Tashi
I'm going to simplify everything here by using an additive error term (rather than multiplying by $e^\eta$). For the simple two-stores-in-a-zip example, the actual data-generating-process is:
$$R_{1}=\eta_{1}+r_{1} R_{2}=\eta_{2}+r_{2}$$

I understand that model as the two equations:
##R_{1}=\eta_{1}+r_{1}##
##R_{2}=\eta_{2}+r_{2}##

If I observed both store's revenues, I would be looking to minimize
$$\eta_{1}^{2}+\eta_{2}^{2}=(R_{1}-r_{1})^{2}+(R_{2}-r_{2})^{2}$$

That's not good terminolgy because it doesn't mention anything about a sum of squares or anything about data.

We need to use terminology that distinguishes among random variables, the expectations of random variables, samples of random variables, and means of samples of random variables. You also need to distinguish between population parameters (e.g. expected values) and estimators of population parameters (e.g. the formula for finding a sample mean is an estimator of the population mean).

However, I don't observe both revenues. Instead, my goal is to minimize the expectation of these, when all I observe is $(\eta_{1}+\eta_{2})=R_{1}+R_{2}-r_{1}-r_{2}$

That's the abstract goal of least squares fitting, but least squares fitting is usually done to data, not to theoretical distributions. If you knew both the theoretical distribution of the actual phenomena as well as the your theoretical family of models, you would know the distribution of the squares of errors. Then it would make sense to say that your goal is to pick a member of the family of models that minimized the expected value of squares of the errors.

The usual situation is that we have data not known to be from a particular theoretical distribution. We assume it is from some family of distributions, but we still don't know exactly which member of the family it is. We don't know the distribution of the squares of the errors, so we can't find the expected value of that distribution.

What we do is pick a formula for an estimator for the mean square error that is a function both of the data (which are known values) and of the parameters that define a member of our family. Then we find the values of the parameters that minimize the estimator.

So, I want to compute the following conditional expectation:
$$\mathrm{E}[\eta_{1}^{2}+\eta_{2}^{2}|(\eta_{1}+\eta_{2})^{2}]$$
You want to "estimate" it.

It seems as though it should be possible to solve the above if I assume that the $\eta$'s are distributed normally with mean zero and a constant s.d.

You have an interesting idea. It needs to be put in the correct language.

A given population quantity can have several different estimators, each of which is "good" in a different way. Without going into the technicalities of what "good" shall mean in this particular problem, your idea is to find a good estimator for the (population) mean of ##\eta^2_1 + \eta^2_2 ## as a function of the variables ##\sigma_1, \sigma_2## and the given data, which are observations of ##(\eta_1 + \eta_2)##.

Formulas for good estimators are often (but not always) suggested by looking at formulas for the population parameters that they estimate. For the example, the formula for the expected value of a random variable involves an integral. For an estimator of the population mean, we use the formula for the sample mean, which "imitates" the integral by using a finite sum. So I agree that it makes sense to look at the theoretical calculation for ##E( \eta^2_1 + \eta^2_2 | (\eta_1 + \eta_2) )##.

EnumaElish
Homework Helper
If the η's are independent, E[Σηi2] = E[(Σηi)2].

My guess is the answer is symmetric. You cannot distinguish between stores, so the argmin solution is to average out R across stores in the area. Put differently, the sum of squared differences will be minimized when rA1 = rA2 = ...

chiro