# A Sum of random variables, given sum of observed variables

1. Jun 24, 2016

### Jeffack

I have a model in which, for each store, predicted revenues are perturbed by a multiplicative shock:

$R = e^\eta r$

where $r$ is predicted and $R$ is observed. $\eta$ is mean zero.

I can find $\eta$ as follows: $\ln( r) - \ln( R) = \eta$. I'm summing the squares of the $\eta$'s.

However, there are some markets where I only observe the sum of sales of multiple stores. The simplest example would be for two stores:
$R_{1}=e^{\eta_{1}}r_{1}$
$R_{2}=e^{\eta_{2}}r_{2}$
where I only observe $R_1 + R_2$. I would like to calculate $\eta_1^2 + \eta_2^2$ (or at least its expected value). Is there any way I can do this? Is there anything at all that I can say about $\eta_1^2 + \eta_2^2$ or $\eta_1 + \eta_2$ ? I'm OK with making assumptions about the distribution of $\eta$ if that's necessary.

Thanks!

2. Jun 24, 2016

### Staff: Mentor

Can you discern from the raw data which category the observables fall into? (summed or not summed)

3. Jun 24, 2016

### Jeffack

Yes. If they are summed, I know how many stores are included in the group.

4. Jun 26, 2016

### Stephen Tashi

The technically correct language is probably that you would like to "estimate" it , not "calculate" it - unless you are asking how to compute each sampled value of $\eta_1^2 + \eta_2^2$ from the corresponding sampled value of $\eta_1 + \eta_2$. (The importance of this technicality is that if you are doing web searches on on the problem, it is a problem of "statistical estimation" or finding a good "estimator" etc. )

First, let's see if I visualize your data correctly. I visualize N different "revenue producing areas"
The data for the $n$_th revenue producing area consists of $M$ samples of the total revenue for stores in that area: $S_{n,i}, n = 1,2,3,...N, i = 1,2,3,...M$ .

For each $k$_th store in $n$_th revenue producing area, the model for the store's revenue is $R_{n,k} = e^{\eta_{n,k}} r_k$.

The random variables involved are the $\eta_{n,k}$.

Is the "predicted" $r_k$ a constant value for each sample? Or do you have data with records like:

area = 3, sample = 5, store r_1 = $634.30, store r_2 =$532.23, observed sum of revenues S = $12432.60 area = 3, sample = 6, store r_1 =$ 236.00, store r_2 = $530.00, observed sum of revenues S =$655.92
.....

?

5. Jun 27, 2016

### Jeffack

You're correct about the "estimation" vs "calculation" comment.

Regarding your question at the end, I am assuming that $r$ is the same for each store within an area (I use zip codes as my area). In other words, my model spits out a single predicted revenue number for each zip code.

Example:
Data Generating Processes for zip code A, which has 3 stores, and zip code B, which has 2 stores: :
$$R_{A}=e^{\eta_{1,A}}r_{A}+e^{\eta_{2,A}}r_{A}+e^{\eta_{3,A}}r_{A}$$
$$R_{B}=e^{\eta_{1,B}}r_{B}+e^{\eta_{2,B}}r_{B}$$
What I observe for A is: The number of stores (3) and the total revenue for those three stores, $R_A$ . What I observe for B is the number of stores (2) and the total revenue for these stores, $R_A$.

6. Jun 27, 2016

### Stephen Tashi

Ok, that's clear.

Now, as to what you wish to estimate - do wish to estimate the distribution of the revenue from each store ?

Is that the eventual purpose of estimating the distribution of $\eta_{1,B} +\eta_{2,B}$ ? I don't see the direction connection between estimating the distribution of that sum (or its expectation) and the problem of estimating the distribution of revenue for each individual store.

7. Jun 27, 2016

### Jeffack

Thanks for the help thus far.

The short answer is that, given observed $R$ and predicted $r$ for each location, I would like to estimate $\eta_{1}^{2}+\eta_{2}^{2}+...+\eta_{N}^{2}$, where $N$ is the number of stores in that location.

The more explanatory answer is that I am trying to construct a model for predicting revenues. I am doing this my minimizing the sum of squared residuals (so, a nonlinear least squares estimation). If I observed each store's revenue, then it would just be a matter of minimizing the sum of $(\ln(R_s )-\ln(r_s))^{2}$, where $s$ is the number of stores . However, I'm concerned that using this method on the data I have (where I have location data but not individual store data) gives each location the same weight, even though they have different numbers of stores (and, therefore, different numbers of realizations of $\eta$). I've tried things like multiplying each location's squared residual by the number of stores in each location, but I wanted to see if there was any way to show mathematically that this is a reasonable thing to do (or, show why the "basic" way is fine)

To get technical... the basic estimation is like this:
$$\hat{\theta}=\underset{\theta}{argmin}\sum_{s}(\ln(R_{s})-\ln(r))^{2}$$
where $r=f(\theta)$

8. Jun 27, 2016

### Stephen Tashi

The $\eta_i$ are random variables, so their sum is a random variable. For a given sales area (zip code) you presumably have several realizations of the total revenue ( daily sales figures or something like that). Are you seeking to estimate the value of $\eta^2_{1,k} + \eta^2_{2,k} + ... + \eta^2_{N,k}$ for each day k = 1,2,....M ?

If you knew each stores revenue, why wouldn't you fit your model by minimizing the sum of $(R_{s,k} - r_{s,k})^2$ ?

Certain transformations of data make least squares fitting easier, but the least squares fit for the transformed data isn't necessarily the least squares fit for the untransformed data. For example the "error" between $100 and$10 is 1 in terms of $log_{10}$, as is the error between $1000 and$100.

9. Jun 28, 2016

### Jeffack

Thanks again for all your help on this. To answer your questions...

Yes, I have panel data, so I have data for multiple zip codes over multiple months.
Yes, that's what I'm trying to do. I'm summing the error over each month and each location.
If I knew each store's revenue (which I don't), that's what I would do.

Right. I'm currently using the multiplicatve error term for two reasons. First, that's what the paper I'm basing my work off of uses. Second, using additive error makes some of the errors for large zip codes gigantic and essentially swamp the small errors that exist in small zip codes. I'm also running some models that use additive error.

I'm starting to concede that there's not much I can do here, simply because I don't have enough information. As an example, if I have a zip code with two stores and predicted revenue of $5 per store. $$R=5e^{\eta_{1}}+5e^{\eta_{2}}=5(e^{\eta_{1}}+e^{\eta_{2}})$$ If each $\eta$ were equal to its expectation (which is zero), then my actual revenue would equal expected revenue, which is$10. Suppose, instead, that observed revenue is \$100. There are many different values of $\eta_{1}$ and $\eta_{2}$ that would give this. For example:

$\eta_{1}=0$ and $\eta_{2}\thickapprox2.94$
or
$\eta_{1}=\eta_{2}\thickapprox2.3$
or many other combinations.

These will lead to different values of $\eta_{1}^{2}+\eta_{2}^{2}$ (in this example the squared sum will be 8.64 in the first instance and 10.58 in the second instance). I don't know what the actual realizations of $\eta$ will be. I could assume that $\eta_{1}=\eta_{2}$, but I'm not sure that I can do that (since that would mean that the two error terms are perfectly correlated, and everything I've done with least squares assumes i.i.d. error terms; Thinking about this another way, the zip code would really just have a single realization of $\eta$, with that term being used for both $\eta_{1}$ and $\eta_{2}$).

10. Jun 28, 2016

### Stephen Tashi

I agree with the impossibility of deducing individual values for the realizations of $\eta_1, \eta_2$.

I'm trying to understand your focus on the random variable $\eta^2_1 + \eta^2_2$. Is the idea that observed values of $\eta^2_1 + \eta^2_2$ would allow you to compare the predicted variance of total sales $R$ with observed variance of total sales ? (You mention least squares, but you haven't given the definition of the predicted variable. You don't have a model predicting the standard deviations for the distributions of the $\eta_i$ do you ? )

I agree with your pessimism about deducing the values of individual realizations of the random variables $\eta_1, \eta_2$ from a datum that only gives their sum. However, the seemingly more ambitious problem of estimating the distributions of random variables from data that only gives their sum is, as the saying goes, a "well known" problem. It's known as a problem of "deconvolution".

When a given distribution can be expressed as the sum of two random variables in more than one way, a deconvolution problem is usually solved by adding some condition on the solution that makes it unique.

I did a brief web search on "deconvolution, lognormals" and found some highly specialized articles where people claim to have deconvolved data into the sum of lognormally distributed random variables, but the specialized contents makes the mathematics obscure - and I'm not sure the mathematics is good. A complication in searching is that many people don't make a distinction between "decomposing" a distribution as "mixture" of distributions and "deconvolving" a distribution into a "convolution" of distributions. The latter task is what you have.

I'm optimistic that we can solve your deconvolution problem. It will require some thinking. If you are writing up your work for an article, you'll have to decide whether the audience will accept some mathematics that is not routinely seen in introductory statistics courses.

11. Jun 29, 2016

### Jeffack

I'm attempting to minimize the sum of squares. So, if there were 3 stores and I observed them all, I would minimize $\eta_{1}^{2}+\eta_{2}^{2}+\eta_{3}^{2}$
. If, instead, stores 2 and 3 are aggregated together, I think that I need to minimize $\eta_{1}^{2}+\{\mathrm{some\ approximation\ of\ \eta_{2}^{2}+\eta_{3}^{2}}\}$. The trick, of course, is figuring out how to approximate $\eta_{2}^{2}+\eta_{3}^{2}$

- I'm not married to the idea of the multiplicative error term - I would switch to a simple linear error term if that makes things easier. I'm using the [tex ]e^{\eta} [\itex] error term because revenues are always positive.

- I had never heard of deconvolution, but I'm reading up on the subject now.

- This is my job market paper in economics. The more obscure math it contains the better :)

12. Jun 29, 2016

### Jeffack

I had an idea. This could be way off base, but here goes...
I'm going to simplify everything here by using an additive error term (rather than multiplying by $e^\eta$). For the simple two-stores-in-a-zip example, the actual data-generating-process is:
$$R_{1}=\eta_{1}+r_{1} R_{2}=\eta_{2}+r_{2}$$
If I observed both store's revenues, I would be looking to minimize
$$\eta_{1}^{2}+\eta_{2}^{2}=(R_{1}-r_{1})^{2}+(R_{2}-r_{2})^{2}$$

However, I don't observe both revenues. Instead, my goal is to minimize the expectation of these, when all I observe is $(\eta_{1}+\eta_{2})=R_{1}+R_{2}-r_{1}-r_{2}$
So, I want to compute the following conditional expectation:
$$\mathrm{E}[\eta_{1}^{2}+\eta_{2}^{2}|(\eta_{1}+\eta_{2})^{2}]$$
It seems as though it should be possible to solve the above if I assume that the $\eta$'s are distributed normally with mean zero and a constant s.d.

13. Jun 29, 2016

### Stephen Tashi

I don't understand that terminology. The $\eta_i$ are random variables. So what would it mean to "minimize" them? You could minimize some parameter of their distribution. For example, you could set their standard deviations equal to zero.

I think you are talking about observed values instead of random variables. The usual scenario for least squares is that we pick parameters for our prediction model (or distribution or whatever) that minimize the least squares error between a predicted value and an observed value. I don't see that a particular k_th observation $\eta^2_{1,k} + \eta^2_{2,k} + \eta^2_{3,k}$ is a difference between a predicted value of something and the observed value of the thing.

As I understand your model, least squares estimate of the observed value of the random variable $\eta_1 + \eta_2 + \eta_3$ would be zero because the $\eta_i$ each have mean zero. The mean value of a random variable is the best least squares predictor of its realized values. Are you trying to find the best squares estimator of $\eta^2_{1,k} + \eta^2_{2,k} + \eta^2_{3,k}$ ? I don't see a direct relation between that estimator and finding the best fit of a revenue model to revenue data.

14. Jun 29, 2016

### Stephen Tashi

I understand that model as the two equations:
$R_{1}=\eta_{1}+r_{1}$
$R_{2}=\eta_{2}+r_{2}$

That's not good terminolgy because it doesn't mention anything about a sum of squares or anything about data.

We need to use terminology that distinguishes among random variables, the expectations of random variables, samples of random variables, and means of samples of random variables. You also need to distinguish between population parameters (e.g. expected values) and estimators of population parameters (e.g. the formula for finding a sample mean is an estimator of the population mean).

That's the abstract goal of least squares fitting, but least squares fitting is usually done to data, not to theoretical distributions. If you knew both the theoretical distribution of the actual phenomena as well as the your theoretical family of models, you would know the distribution of the squares of errors. Then it would make sense to say that your goal is to pick a member of the family of models that minimized the expected value of squares of the errors.

The usual situation is that we have data not known to be from a particular theoretical distribution. We assume it is from some family of distributions, but we still don't know exactly which member of the family it is. We don't know the distribution of the squares of the errors, so we can't find the expected value of that distribution.

What we do is pick a formula for an estimator for the mean square error that is a function both of the data (which are known values) and of the parameters that define a member of our family. Then we find the values of the parameters that minimize the estimator.

You want to "estimate" it.

You have an interesting idea. It needs to be put in the correct language.

A given population quantity can have several different estimators, each of which is "good" in a different way. Without going into the technicalities of what "good" shall mean in this particular problem, your idea is to find a good estimator for the (population) mean of $\eta^2_1 + \eta^2_2$ as a function of the variables $\sigma_1, \sigma_2$ and the given data, which are observations of $(\eta_1 + \eta_2)$.

Formulas for good estimators are often (but not always) suggested by looking at formulas for the population parameters that they estimate. For the example, the formula for the expected value of a random variable involves an integral. For an estimator of the population mean, we use the formula for the sample mean, which "imitates" the integral by using a finite sum. So I agree that it makes sense to look at the theoretical calculation for $E( \eta^2_1 + \eta^2_2 | (\eta_1 + \eta_2) )$.

15. Jul 12, 2016

### EnumaElish

If the η's are independent, E[Σηi2] = E[(Σηi)2].

My guess is the answer is symmetric. You cannot distinguish between stores, so the argmin solution is to average out R across stores in the area. Put differently, the sum of squared differences will be minimized when rA1 = rA2 = ...

16. Jul 13, 2016

### chiro

If you can't write the final function you need as a function of your inputs then you won't be able to get estimates.

Have you tried solving for this function first? Once you have that then you can start looking at constructing test statistics but until you have that you can't do anything useful with the data to do what you need to do.