Distribution of frequency question?

1. Aug 16, 2007

jimmy1

[SOLVED] Distribution of frequency question?

I have 2 normally distributed random variables X and Y. X describes the distribution of frequency of an element in a group A, and Y describes the distribution of frequency of the same element in another group B.
Now what I want to do is find the distribution of frequency of this element in group A and B together, that is (A U B). Is it possible to get this distribution from just the distribution of X and Y??

2. Aug 16, 2007

EnumaElish

You need to construct a new random variable that represents A U B. Questions to consider:

Are X and Y independent?

How are sets A and B related? Are they disjoint intervals on R?

Edit: the more I think about this, the more questions I have. Normal distribution is defined over the entire set of reals. So in fact A and B are identical; but X and Y represent two normal variates N($\mu_X, \sigma_X^2$) and N($\mu_Y, \sigma_Y^2$). Why isn't this the case?

Last edited: Aug 16, 2007
3. Aug 16, 2007

jimmy1

Sorry, I probably wasn't very clear in the first post. Let me try again:

I am doing an experiment and I have two completely unrelated groups, A and B. (There are only two unique elements in both A and B, say x and y). In each repliacte of the experiment a certain number of copies of x and y appear in groups A and B. So x and y can be seen to be random variables, and I know their distributions. So I have the distribution of the number of copies of x and y within each group and also I have the distribution of the frequency of x and y within each group.

Now the next step is to get the distribution of frequency of x or y when I combine both groups. Unfortunately I cannot define a random variable for A U B directly, so my question is can I get the distribution of frequency of x or y when I combine the groups from the information above?

Note that the distribution of frequency within each group is normally distributed, A and B are completly independent, elements x and y are also independent within and between groups.

4. Aug 16, 2007

EnumaElish

I am unclear about what each draw of x and y is. I am assuming that it is a real number. Is that correct?

Even better, can you post a concrete example. with as much research background as possible?

Last edited: Aug 16, 2007
5. Aug 16, 2007

jimmy1

I'm just referring to x and y as the two types of cells (elements) that can occupy each of group A and B.

The first random variable is the number of copies of each cell within each group, and I have the distribution for this, and from this I can get the other random variable which is the distribution of frequency of each cell within each group, and so to answer your question, yes both these random variables are real numbers.

EDIT: I'll try and give an example in the next post

Last edited: Aug 16, 2007
6. Aug 16, 2007

jimmy1

Example:

I have 2 sets, A and B. The two possible elements in each set are x and y. In each replicate of the experiment there are going to be a random number of x and y in each set. My aim is to get the distribution of frequency of x and y from the combined set A U B. For example the following might be the output for 2 repliactes of the experiment:

Replicate 1
Set A = {x, y, x, x, x, y} (Freq of x = 4/6 , Freq of y = 2/6) ​
Set B = {y, x ,x, x} (Freq of x = 3/4 , Freq of y = 1/4) ​
Set (A U B) = {x, y, x, x, x, y, y, x ,x, x} (Freq of x = 7/10 , Freq of y = 3/10) ​

Replicate 2
Set A = {y, x, y, y, y} (Freq of x = 1/5 , Freq of y = 4/5) ​
Set B = {y, y, x, x, x, y} (Freq of x = 3/6 , Freq of y = 3/6) ​
Set (A U B) = {y, x, y, y, y, y, y, x, x, x, y} (Freq of x = 4/11 , Freq of y = 7/11) ​

So basically from the above experiments I already know the following theoretical information:
1) The distribution of the number of copies of x and y within each set A and B
2) The distribution of the frequency of x and y within each set A and B

What I need to get using the above two pieces of information is the distribution of frequency of x and y in the set (A U B)??
Hope this makes it clearer

7. Aug 17, 2007

EnumaElish

What threw me off was your statement that each of X and Y is Normal.

Let f = Freq(X in A) and 1 - f = Freq(Y in A).
Let g = Freq(X in B) and 1 - g = Freq(Y in B).

Let C = AUB.

Then Freq(X in C) = (f * #A + g * #B)/#C and Freq(Y in C) = 1 - Freq(X in C).

Does this work?

8. Aug 17, 2007

jimmy1

Yes, the above formula will probably work for any one replicate, but I'm looking for the distribution of "Freq(X in C)".

For instance, in your formula above, all the variables in the top line (ie. f ,#A, g, #B) are random variables, of which I know the mean and variance and also have an expression for the full distribution of all 4 variables. (In fact all 4 can probably be approximated by Normal distributions).

Anyway using the information available from these 4 random variables, can I get the distribution of "Freq(X in C)" ??

I don't think I can use your above forumla? I may be able to use it to get the mean of "Freq(X in C)", but for example what would be the variance of "Freq(X in C)" be??

9. Aug 17, 2007

EnumaElish

Last edited: Aug 17, 2007
10. Aug 18, 2007

jimmy1

f and g are random varibales that describe the distribution of frequencies, so I'm not sure how you would define a positive random variable. But if it helps you can assume that their means and variances are alsways positive. (Out of curiouisty how would you define a positive random variable?)

I really have doubts whether I can use this. From what I gather from the link, you are suggesting that I can use the Central Limit Theorem for the product of random variables. There are 2 issues here:

1) For the central limit theorem to apply, you need a large number of independent random variables. In my situation I have f * #A/#C or g * #B/#C, which is only the product of 3 random variables. Surely the central limit theorem cannot apply to such a small number??

2) Even if the Central Limit Theorem would apply I would need to get the product of either f * #A/#C or g * #B/#C. In both situations I have the random variable (#C)^-1, which causes lots of problems. I could get the distribution for the random variable #C (which would be a Normal distribution), but as far as I know from my limited probability knowledge getting the inverse of a random variable is not a trivial matter?? Perhaps I am wrong??

Last edited by a moderator: May 3, 2017
11. Aug 18, 2007

EnumaElish

Since f and g are frequencies, they are positive. Any random variable with lower bound of zero would fit the bill. 3 examples are lognormal, positive uniform, and F distributions.

That's why I wrote "as an approximation."

You need to know or be able to derive the distributions of f, #A, and 1/#C.

An alternative is to assume each of f * #A/#C and g * #B/#C is normal. Then their sum is normal.

If you are looking for moments only, and not a distribution function per se then you can use the approximation formulas for deriving the variance of products or ratios of random variables. See, e.g., Mood, Graybill, Boes, Introduction to the Theory of Statistics.

Another alternative is to simulate.

Last edited: Aug 18, 2007
12. Aug 19, 2007

jimmy1

Ok, I'll have a look through those references and see what I can do. Thanks a lot for the help, it's been really good!!

13. Aug 20, 2007