Distribution of frequency question?

  • Context: Undergrad 
  • Thread starter Thread starter jimmy1
  • Start date Start date
  • Tags Tags
    Distribution Frequency
Click For Summary

Discussion Overview

The discussion revolves around the challenge of determining the distribution of frequency of elements in two independent groups, A and B, based on their individual distributions. Participants explore whether it is possible to derive the combined distribution from the known distributions of two normally distributed random variables representing frequencies in each group.

Discussion Character

  • Exploratory
  • Technical explanation
  • Mathematical reasoning
  • Debate/contested

Main Points Raised

  • One participant asks if it is possible to find the distribution of frequency for the union of two groups A and B from the distributions of two independent random variables, X and Y.
  • Another participant suggests constructing a new random variable for A U B and raises questions about the independence of X and Y and the relationship between sets A and B.
  • A participant clarifies that A and B are independent groups with known distributions for the number of copies of elements x and y, and seeks to find the combined frequency distribution.
  • Participants discuss the implications of the normal distribution being defined over the entire set of reals and question the assumptions made about the distributions of X and Y.
  • One participant proposes a formula for calculating the frequency of X in the combined set but questions whether it can be used to derive the distribution of that frequency.
  • Another participant mentions the Central Limit Theorem and its applicability to the products of random variables, while also expressing doubts about the conditions required for its application.
  • There is a discussion about defining positive random variables and the implications for the analysis of frequencies.
  • Concerns are raised about the challenges of working with the inverse of random variables and the limitations of applying the Central Limit Theorem to a small number of variables.

Areas of Agreement / Disagreement

Participants express differing views on the feasibility of deriving the combined frequency distribution from the individual distributions. There is no consensus on the applicability of the Central Limit Theorem or the methods proposed for calculating the distribution.

Contextual Notes

Participants note the complexity of defining random variables and the challenges associated with deriving distributions from products and inverses of random variables. The discussion highlights the need for careful consideration of assumptions and the limitations of the mathematical approaches discussed.

jimmy1
Messages
60
Reaction score
0
[SOLVED] Distribution of frequency question?

I have 2 normally distributed random variables X and Y. X describes the distribution of frequency of an element in a group A, and Y describes the distribution of frequency of the same element in another group B.
Now what I want to do is find the distribution of frequency of this element in group A and B together, that is (A U B). Is it possible to get this distribution from just the distribution of X and Y??
 
Physics news on Phys.org
You need to construct a new random variable that represents A U B. Questions to consider:

Are X and Y independent?

How are sets A and B related? Are they disjoint intervals on R?

Edit: the more I think about this, the more questions I have. Normal distribution is defined over the entire set of reals. So in fact A and B are identical; but X and Y represent two normal variates N(\mu_X, \sigma_X^2) and N(\mu_Y, \sigma_Y^2). Why isn't this the case?
 
Last edited:
Sorry, I probably wasn't very clear in the first post. Let me try again:

I am doing an experiment and I have two completely unrelated groups, A and B. (There are only two unique elements in both A and B, say x and y). In each repliacte of the experiment a certain number of copies of x and y appear in groups A and B. So x and y can be seen to be random variables, and I know their distributions. So I have the distribution of the number of copies of x and y within each group and also I have the distribution of the frequency of x and y within each group.

Now the next step is to get the distribution of frequency of x or y when I combine both groups. Unfortunately I cannot define a random variable for A U B directly, so my question is can I get the distribution of frequency of x or y when I combine the groups from the information above?

Note that the distribution of frequency within each group is normally distributed, A and B are completely independent, elements x and y are also independent within and between groups.
 
I am unclear about what each draw of x and y is. I am assuming that it is a real number. Is that correct?

Even better, can you post a concrete example. with as much research background as possible?
 
Last edited:
I'm just referring to x and y as the two types of cells (elements) that can occupy each of group A and B.

The first random variable is the number of copies of each cell within each group, and I have the distribution for this, and from this I can get the other random variable which is the distribution of frequency of each cell within each group, and so to answer your question, yes both these random variables are real numbers.

EDIT: I'll try and give an example in the next post
 
Last edited:
Example:

I have 2 sets, A and B. The two possible elements in each set are x and y. In each replicate of the experiment there are going to be a random number of x and y in each set. My aim is to get the distribution of frequency of x and y from the combined set A U B. For example the following might be the output for 2 repliactes of the experiment:

Replicate 1
Set A = {x, y, x, x, x, y} (Freq of x = 4/6 , Freq of y = 2/6)​
Set B = {y, x ,x, x} (Freq of x = 3/4 , Freq of y = 1/4)​
Set (A U B) = {x, y, x, x, x, y, y, x ,x, x} (Freq of x = 7/10 , Freq of y = 3/10)​

Replicate 2
Set A = {y, x, y, y, y} (Freq of x = 1/5 , Freq of y = 4/5)​
Set B = {y, y, x, x, x, y} (Freq of x = 3/6 , Freq of y = 3/6)​
Set (A U B) = {y, x, y, y, y, y, y, x, x, x, y} (Freq of x = 4/11 , Freq of y = 7/11)​


So basically from the above experiments I already know the following theoretical information:
1) The distribution of the number of copies of x and y within each set A and B
2) The distribution of the frequency of x and y within each set A and B

What I need to get using the above two pieces of information is the distribution of frequency of x and y in the set (A U B)??
Hope this makes it clearer
 
What threw me off was your statement that each of X and Y is Normal.

Let f = Freq(X in A) and 1 - f = Freq(Y in A).
Let g = Freq(X in B) and 1 - g = Freq(Y in B).

Let C = AUB.

Then Freq(X in C) = (f * #A + g * #B)/#C and Freq(Y in C) = 1 - Freq(X in C).

Does this work?
 
EnumaElish said:
Then Freq(X in C) = (f * #A + g * #B)/#C and Freq(Y in C) = 1 - Freq(X in C).

Does this work?

Yes, the above formula will probably work for anyone replicate, but I'm looking for the distribution of "Freq(X in C)".

For instance, in your formula above, all the variables in the top line (ie. f ,#A, g, #B) are random variables, of which I know the mean and variance and also have an expression for the full distribution of all 4 variables. (In fact all 4 can probably be approximated by Normal distributions).

Anyway using the information available from these 4 random variables, can I get the distribution of "Freq(X in C)" ??

I don't think I can use your above formula? I may be able to use it to get the mean of "Freq(X in C)", but for example what would be the variance of "Freq(X in C)" be??
 
  • #10
EnumaElish said:
I assume each of f and g is positive.

f and g are random varibales that describe the distribution of frequencies, so I'm not sure how you would define a positive random variable. But if it helps you can assume that their means and variances are alsways positive. (Out of curiouisty how would you define a positive random variable?)

EnumaElish said:
The following applies as an approximation for each of f * #A/#C and g * #B/#C:
http://en.wikipedia.org/wiki/Central...om_ variables
I really have doubts whether I can use this. From what I gather from the link, you are suggesting that I can use the Central Limit Theorem for the product of random variables. There are 2 issues here:

1) For the central limit theorem to apply, you need a large number of independent random variables. In my situation I have f * #A/#C or g * #B/#C, which is only the product of 3 random variables. Surely the central limit theorem cannot apply to such a small number??

2) Even if the Central Limit Theorem would apply I would need to get the product of either f * #A/#C or g * #B/#C. In both situations I have the random variable (#C)^-1, which causes lots of problems. I could get the distribution for the random variable #C (which would be a Normal distribution), but as far as I know from my limited probability knowledge getting the inverse of a random variable is not a trivial matter?? Perhaps I am wrong??
 
Last edited by a moderator:
  • #11
jimmy1 said:
f and g are random varibales that describe the distribution of frequencies, so I'm not sure how you would define a positive random variable. But if it helps you can assume that their means and variances are alsways positive. (Out of curiouisty how would you define a positive random variable?)
Since f and g are frequencies, they are positive. Any random variable with lower bound of zero would fit the bill. 3 examples are lognormal, positive uniform, and F distributions.

1) For the central limit theorem to apply, you need a large number of independent random variables. In my situation I have f * #A/#C or g * #B/#C, which is only the product of 3 random variables. Surely the central limit theorem cannot apply to such a small number??
That's why I wrote "as an approximation."

2) Even if the Central Limit Theorem would apply I would need to get the product of either f * #A/#C or g * #B/#C. In both situations I have the random variable (#C)^-1, which causes lots of problems. I could get the distribution for the random variable #C (which would be a Normal distribution), but as far as I know from my limited probability knowledge getting the inverse of a random variable is not a trivial matter?? Perhaps I am wrong??
You need to know or be able to derive the distributions of f, #A, and 1/#C.

An alternative is to assume each of f * #A/#C and g * #B/#C is normal. Then their sum is normal.

If you are looking for moments only, and not a distribution function per se then you can use the approximation formulas for deriving the variance of products or ratios of random variables. See, e.g., Mood, Graybill, Boes, Introduction to the Theory of Statistics.

Another alternative is to simulate.
 
Last edited:
  • #12
Ok, I'll have a look through those references and see what I can do. Thanks a lot for the help, it's been really good!
 

Similar threads

  • · Replies 7 ·
Replies
7
Views
2K
  • · Replies 5 ·
Replies
5
Views
2K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 30 ·
2
Replies
30
Views
5K
  • · Replies 5 ·
Replies
5
Views
3K
Replies
20
Views
3K
  • · Replies 11 ·
Replies
11
Views
4K
  • · Replies 8 ·
Replies
8
Views
3K
  • · Replies 24 ·
Replies
24
Views
4K
  • · Replies 6 ·
Replies
6
Views
3K