Distribution of frequency question?

In summary: I thought that X and Y were random variables.In summary, the researcher is trying to find the distribution of frequency of an element in two unrelated groups.
  • #1
jimmy1
61
0
[SOLVED] Distribution of frequency question?

I have 2 normally distributed random variables X and Y. X describes the distribution of frequency of an element in a group A, and Y describes the distribution of frequency of the same element in another group B.
Now what I want to do is find the distribution of frequency of this element in group A and B together, that is (A U B). Is it possible to get this distribution from just the distribution of X and Y??
 
Physics news on Phys.org
  • #2
You need to construct a new random variable that represents A U B. Questions to consider:

Are X and Y independent?

How are sets A and B related? Are they disjoint intervals on R?

Edit: the more I think about this, the more questions I have. Normal distribution is defined over the entire set of reals. So in fact A and B are identical; but X and Y represent two normal variates N([itex]\mu_X, \sigma_X^2[/itex]) and N([itex]\mu_Y, \sigma_Y^2[/itex]). Why isn't this the case?
 
Last edited:
  • #3
Sorry, I probably wasn't very clear in the first post. Let me try again:

I am doing an experiment and I have two completely unrelated groups, A and B. (There are only two unique elements in both A and B, say x and y). In each repliacte of the experiment a certain number of copies of x and y appear in groups A and B. So x and y can be seen to be random variables, and I know their distributions. So I have the distribution of the number of copies of x and y within each group and also I have the distribution of the frequency of x and y within each group.

Now the next step is to get the distribution of frequency of x or y when I combine both groups. Unfortunately I cannot define a random variable for A U B directly, so my question is can I get the distribution of frequency of x or y when I combine the groups from the information above?

Note that the distribution of frequency within each group is normally distributed, A and B are completely independent, elements x and y are also independent within and between groups.
 
  • #4
I am unclear about what each draw of x and y is. I am assuming that it is a real number. Is that correct?

Even better, can you post a concrete example. with as much research background as possible?
 
Last edited:
  • #5
I'm just referring to x and y as the two types of cells (elements) that can occupy each of group A and B.

The first random variable is the number of copies of each cell within each group, and I have the distribution for this, and from this I can get the other random variable which is the distribution of frequency of each cell within each group, and so to answer your question, yes both these random variables are real numbers.

EDIT: I'll try and give an example in the next post
 
Last edited:
  • #6
Example:

I have 2 sets, A and B. The two possible elements in each set are x and y. In each replicate of the experiment there are going to be a random number of x and y in each set. My aim is to get the distribution of frequency of x and y from the combined set A U B. For example the following might be the output for 2 repliactes of the experiment:

Replicate 1
Set A = {x, y, x, x, x, y} (Freq of x = 4/6 , Freq of y = 2/6)​
Set B = {y, x ,x, x} (Freq of x = 3/4 , Freq of y = 1/4)​
Set (A U B) = {x, y, x, x, x, y, y, x ,x, x} (Freq of x = 7/10 , Freq of y = 3/10)​

Replicate 2
Set A = {y, x, y, y, y} (Freq of x = 1/5 , Freq of y = 4/5)​
Set B = {y, y, x, x, x, y} (Freq of x = 3/6 , Freq of y = 3/6)​
Set (A U B) = {y, x, y, y, y, y, y, x, x, x, y} (Freq of x = 4/11 , Freq of y = 7/11)​


So basically from the above experiments I already know the following theoretical information:
1) The distribution of the number of copies of x and y within each set A and B
2) The distribution of the frequency of x and y within each set A and B

What I need to get using the above two pieces of information is the distribution of frequency of x and y in the set (A U B)??
Hope this makes it clearer
 
  • #7
What threw me off was your statement that each of X and Y is Normal.

Let f = Freq(X in A) and 1 - f = Freq(Y in A).
Let g = Freq(X in B) and 1 - g = Freq(Y in B).

Let C = AUB.

Then Freq(X in C) = (f * #A + g * #B)/#C and Freq(Y in C) = 1 - Freq(X in C).

Does this work?
 
  • #8
EnumaElish said:
Then Freq(X in C) = (f * #A + g * #B)/#C and Freq(Y in C) = 1 - Freq(X in C).

Does this work?

Yes, the above formula will probably work for anyone replicate, but I'm looking for the distribution of "Freq(X in C)".

For instance, in your formula above, all the variables in the top line (ie. f ,#A, g, #B) are random variables, of which I know the mean and variance and also have an expression for the full distribution of all 4 variables. (In fact all 4 can probably be approximated by Normal distributions).

Anyway using the information available from these 4 random variables, can I get the distribution of "Freq(X in C)" ??

I don't think I can use your above forumla? I may be able to use it to get the mean of "Freq(X in C)", but for example what would be the variance of "Freq(X in C)" be??
 
  • #10
EnumaElish said:
I assume each of f and g is positive.

f and g are random varibales that describe the distribution of frequencies, so I'm not sure how you would define a positive random variable. But if it helps you can assume that their means and variances are alsways positive. (Out of curiouisty how would you define a positive random variable?)

EnumaElish said:
The following applies as an approximation for each of f * #A/#C and g * #B/#C:
http://en.wikipedia.org/wiki/Central...om_ [Broken] variables
I really have doubts whether I can use this. From what I gather from the link, you are suggesting that I can use the Central Limit Theorem for the product of random variables. There are 2 issues here:

1) For the central limit theorem to apply, you need a large number of independent random variables. In my situation I have f * #A/#C or g * #B/#C, which is only the product of 3 random variables. Surely the central limit theorem cannot apply to such a small number??

2) Even if the Central Limit Theorem would apply I would need to get the product of either f * #A/#C or g * #B/#C. In both situations I have the random variable (#C)^-1, which causes lots of problems. I could get the distribution for the random variable #C (which would be a Normal distribution), but as far as I know from my limited probability knowledge getting the inverse of a random variable is not a trivial matter?? Perhaps I am wrong??
 
Last edited by a moderator:
  • #11
jimmy1 said:
f and g are random varibales that describe the distribution of frequencies, so I'm not sure how you would define a positive random variable. But if it helps you can assume that their means and variances are alsways positive. (Out of curiouisty how would you define a positive random variable?)
Since f and g are frequencies, they are positive. Any random variable with lower bound of zero would fit the bill. 3 examples are lognormal, positive uniform, and F distributions.

1) For the central limit theorem to apply, you need a large number of independent random variables. In my situation I have f * #A/#C or g * #B/#C, which is only the product of 3 random variables. Surely the central limit theorem cannot apply to such a small number??
That's why I wrote "as an approximation."

2) Even if the Central Limit Theorem would apply I would need to get the product of either f * #A/#C or g * #B/#C. In both situations I have the random variable (#C)^-1, which causes lots of problems. I could get the distribution for the random variable #C (which would be a Normal distribution), but as far as I know from my limited probability knowledge getting the inverse of a random variable is not a trivial matter?? Perhaps I am wrong??
You need to know or be able to derive the distributions of f, #A, and 1/#C.

An alternative is to assume each of f * #A/#C and g * #B/#C is normal. Then their sum is normal.

If you are looking for moments only, and not a distribution function per se then you can use the approximation formulas for deriving the variance of products or ratios of random variables. See, e.g., Mood, Graybill, Boes, Introduction to the Theory of Statistics.

Another alternative is to simulate.
 
Last edited:
  • #12
Ok, I'll have a look through those references and see what I can do. Thanks a lot for the help, it's been really good!
 

1. What is the distribution of frequency question?

The distribution of frequency question is a statistical term that refers to a type of question where respondents are asked to choose from a set of response categories that represent different levels of frequency. This type of question is used to gather information about how often or how much a certain behavior, event, or characteristic occurs within a population.

2. Why is the distribution of frequency question important?

The distribution of frequency question is important because it allows researchers to understand the pattern or distribution of a particular behavior, event, or characteristic within a population. This information can help identify trends, patterns, or outliers that can be further explored and analyzed.

3. What is the difference between a distribution of frequency question and a Likert scale question?

A distribution of frequency question asks respondents to choose from a set of response categories that represent different levels of frequency, while a Likert scale question asks respondents to rate their level of agreement or disagreement with a statement using a scale (e.g. strongly agree, agree, neutral, disagree, strongly disagree). Both types of questions provide information about attitudes or behaviors, but they measure different aspects.

4. How do you analyze data from a distribution of frequency question?

To analyze data from a distribution of frequency question, researchers can create a frequency table or graph to display the number or percentage of respondents who chose each response category. This can help identify the most common or extreme responses, as well as any patterns or trends in the data.

5. What are some common mistakes when using distribution of frequency questions?

Some common mistakes when using distribution of frequency questions include not providing enough response categories, using unclear or biased language in the response options, and not considering the potential impact of outliers. It is important to carefully design and test distribution of frequency questions to ensure they accurately capture the desired information.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
30
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
189
  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
369
  • Set Theory, Logic, Probability, Statistics
Replies
6
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
8
Views
880
  • Set Theory, Logic, Probability, Statistics
Replies
8
Views
2K
Replies
4
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
335
Back
Top