Distribution of frequency question?

jimmy1 · Aug 16, 2007

[SOLVED] Distribution of frequency question?

I have 2 normally distributed random variables X and Y. X describes the distribution of frequency of an element in a group A, and Y describes the distribution of frequency of the same element in another group B.
Now what I want to do is find the distribution of frequency of this element in group A and B together, that is (A U B). Is it possible to get this distribution from just the distribution of X and Y??

EnumaElish · Aug 16, 2007

You need to construct a new random variable that represents A U B. Questions to consider:

Are X and Y independent?

How are sets A and B related? Are they disjoint intervals on R?

Edit: the more I think about this, the more questions I have. Normal distribution is defined over the entire set of reals. So in fact A and B are identical; but X and Y represent two normal variates N([itex]\mu_X, \sigma_X^2[/itex]) and N([itex]\mu_Y, \sigma_Y^2[/itex]). Why isn't this the case?

jimmy1 · Aug 16, 2007

Sorry, I probably wasn't very clear in the first post. Let me try again:

I am doing an experiment and I have two completely unrelated groups, A and B. (There are only two unique elements in both A and B, say x and y). In each repliacte of the experiment a certain number of copies of x and y appear in groups A and B. So x and y can be seen to be random variables, and I know their distributions. So I have the distribution of the number of copies of x and y within each group and also I have the distribution of the frequency of x and y within each group.

Now the next step is to get the distribution of frequency of x or y when I combine both groups. Unfortunately I cannot define a random variable for A U B directly, so my question is can I get the distribution of frequency of x or y when I combine the groups from the information above?

Note that the distribution of frequency within each group is normally distributed, A and B are completely independent, elements x and y are also independent within and between groups.

EnumaElish · Aug 16, 2007

I am unclear about what each draw of x and y is. I am assuming that it is a real number. Is that correct?

Even better, can you post a concrete example. with as much research background as possible?

jimmy1 · Aug 16, 2007

I'm just referring to x and y as the two types of cells (elements) that can occupy each of group A and B.

The first random variable is the number of copies of each cell within each group, and I have the distribution for this, and from this I can get the other random variable which is the distribution of frequency of each cell within each group, and so to answer your question, yes both these random variables are real numbers.

EDIT: I'll try and give an example in the next post

jimmy1 · Aug 16, 2007

Example:

I have 2 sets, A and B. The two possible elements in each set are x and y. In each replicate of the experiment there are going to be a random number of x and y in each set. My aim is to get the distribution of frequency of x and y from the combined set A U B. For example the following might be the output for 2 repliactes of the experiment:

Replicate 1

Set A = {x, y, x, x, x, y} (Freq of x = 4/6 , Freq of y = 2/6)

Set B = {y, x ,x, x} (Freq of x = 3/4 , Freq of y = 1/4)

Set (A U B) = {x, y, x, x, x, y, y, x ,x, x} (Freq of x = 7/10 , Freq of y = 3/10)

Replicate 2

Set A = {y, x, y, y, y} (Freq of x = 1/5 , Freq of y = 4/5)

Set B = {y, y, x, x, x, y} (Freq of x = 3/6 , Freq of y = 3/6)

Set (A U B) = {y, x, y, y, y, y, y, x, x, x, y} (Freq of x = 4/11 , Freq of y = 7/11)

So basically from the above experiments I already know the following theoretical information:
1) The distribution of the number of copies of x and y within each set A and B
2) The distribution of the frequency of x and y within each set A and B

What I need to get using the above two pieces of information is the distribution of frequency of x and y in the set (A U B)??
Hope this makes it clearer

EnumaElish · Aug 17, 2007

What threw me off was your statement that each of X and Y is Normal.

Let f = Freq(X in A) and 1 - f = Freq(Y in A).
Let g = Freq(X in B) and 1 - g = Freq(Y in B).

Let C = AUB.

Then Freq(X in C) = (f * #A + g * #B)/#C and Freq(Y in C) = 1 - Freq(X in C).

Does this work?

jimmy1 · Aug 17, 2007

EnumaElish said:

Then Freq(X in C) = (f * #A + g * #B)/#C and Freq(Y in C) = 1 - Freq(X in C).

Does this work?

Yes, the above formula will probably work for anyone replicate, but I'm looking for the distribution of "Freq(X in C)".

For instance, in your formula above, all the variables in the top line (ie. f ,#A, g, #B) are random variables, of which I know the mean and variance and also have an expression for the full distribution of all 4 variables. (In fact all 4 can probably be approximated by Normal distributions).

Anyway using the information available from these 4 random variables, can I get the distribution of "Freq(X in C)" ??

I don't think I can use your above formula? I may be able to use it to get the mean of "Freq(X in C)", but for example what would be the variance of "Freq(X in C)" be??

EnumaElish · Aug 17, 2007

I assume each of f and g is positive.

The following applies as an approximation for each of f * #A/#C and g * #B/#C:
http://en.wikipedia.org/wiki/Central_limit_theorem#Products_of_positive_random_variables

The Fenton-Wilkinson result would apply to their sum = Freq(X in C):
http://en.wikipedia.org/wiki/Lognormal#Related_distributions

jimmy1 · Aug 18, 2007

EnumaElish said:

I assume each of f and g is positive.

f and g are random varibales that describe the distribution of frequencies, so I'm not sure how you would define a positive random variable. But if it helps you can assume that their means and variances are alsways positive. (Out of curiouisty how would you define a positive random variable?)

EnumaElish said:

The following applies as an approximation for each of f * #A/#C and g * #B/#C:
http://en.wikipedia.org/wiki/Central...om_ variables

I really have doubts whether I can use this. From what I gather from the link, you are suggesting that I can use the Central Limit Theorem for the product of random variables. There are 2 issues here:

1) For the central limit theorem to apply, you need a large number of independent random variables. In my situation I have f * #A/#C or g * #B/#C, which is only the product of 3 random variables. Surely the central limit theorem cannot apply to such a small number??

2) Even if the Central Limit Theorem would apply I would need to get the product of either f * #A/#C or g * #B/#C. In both situations I have the random variable (#C)^-1, which causes lots of problems. I could get the distribution for the random variable #C (which would be a Normal distribution), but as far as I know from my limited probability knowledge getting the inverse of a random variable is not a trivial matter?? Perhaps I am wrong??

EnumaElish · Aug 18, 2007

jimmy1 said:

f and g are random varibales that describe the distribution of frequencies, so I'm not sure how you would define a positive random variable. But if it helps you can assume that their means and variances are alsways positive. (Out of curiouisty how would you define a positive random variable?)

Since f and g are frequencies, they are positive. Any random variable with lower bound of zero would fit the bill. 3 examples are lognormal, positive uniform, and F distributions.

1) For the central limit theorem to apply, you need a large number of independent random variables. In my situation I have f * #A/#C or g * #B/#C, which is only the product of 3 random variables. Surely the central limit theorem cannot apply to such a small number??

That's why I wrote "as an approximation."

2) Even if the Central Limit Theorem would apply I would need to get the product of either f * #A/#C or g * #B/#C. In both situations I have the random variable (#C)^-1, which causes lots of problems. I could get the distribution for the random variable #C (which would be a Normal distribution), but as far as I know from my limited probability knowledge getting the inverse of a random variable is not a trivial matter?? Perhaps I am wrong??

You need to know or be able to derive the distributions of f, #A, and 1/#C.

An alternative is to assume each of f * #A/#C and g * #B/#C is normal. Then their sum is normal.

If you are looking for moments only, and not a distribution function per se then you can use the approximation formulas for deriving the variance of products or ratios of random variables. See, e.g., Mood, Graybill, Boes, Introduction to the Theory of Statistics.

Another alternative is to simulate.

jimmy1 · Aug 19, 2007

Ok, I'll have a look through those references and see what I can do. Thanks a lot for the help, it's been really good!

EnumaElish · Aug 20, 2007

See https://www.physicsforums.com/showthread.php?p=1406179#post1406179

Distribution of frequency question?

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Similar threads

Graduate Hypothesis testing: Defining H0, HA hypotheses so that ( H_A)_A' makes sense

Undergrad My basic understanding of set theory

Undergrad How do E[X] and E[|X|] relate?

Graduate Expected numbers of cards of a last color remaining

Undergrad The problem of points

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight