Comparing normal distribution divided by normal distribution

In summary, the person is a 2nd year biology grad student trying to compare two distributions (X and Y) in terms of statistical significance. X and Y are ratio distributions and the person is wondering if there is a standard way to test for significance in this case. They have searched on Google and found that X and Y are Cauchy distributions. The person wants to test the hypothesis that the parameters of the two distributions are equal and is considering using a non-parametric test like the Wilcoxon. They also mention that the variables involved in X may not be independent and question the assumption of normality. They provide an example of their problem involving cell proliferation and factor C.
  • #1
leothelion
6
0
Hi everyone!

I have a question on how to compare two distributions. I'm currently a 2nd year biology grad student, and I'm trying to compare a parameter that evaluates the efficacy of a cell type. The math problem is this:

Let [itex]X[/itex] and [itex]Y[/itex] each be an average of three variables divided by an average of another three variables, e.g. [itex]X = \frac{x_1 + x_2 + x_3}{x_4 + x_5 + x_6}[/itex]. Assume that each [itex]x_i[/itex] are normally distributed.

How can I compare [itex]X[/itex] and [itex]Y[/itex] in terms of statistical significance? I am not sure if [itex]X[/itex] and [itex]Y[/itex] are normal distributions themselves. My guess is that they are not, since the product of two normal distributions results in a non-normal distribution?

edit: I just did some google searching and the term that describes X and Y are "ratio distributions". Is there a standard way to test for significance in this case?

Any help or suggestions would be much appreciated. Thanks!
 
Last edited:
Physics news on Phys.org
  • #2
If, in your definition, X and Y are independent random vaiables you could consider their difference. Many many distributions of the type X-Y may show signs of normality. In this case if X has mean /bar{x}, variance /sigma^{2}_{x} and Y has mean /bar{y}, variance /sigma^{2}_{y} then you may find X-Y /tilde $N(/bar{x} - /bar{y}, /sigma^{2}_{x} + /sigma^{2}_{y})$.
 
  • #3
leothelion,

You aren't expressing what you want to do clearly. I think what you are trying to say is that "I want to test the hypothesis that the the parameters of the two distributions are equal at a given level of statistical significance".

You could test the hypothesis that the two distributions are the same using a non-parameteric test like the Wilcoxon if you think a difference in the distributions implies one tends to produce larger ratios than the other. (It isn't clear that it does from what you said.)

You didn't say whether the random variables involved in X are independent of each other. You didn't say what parameter or parameters are involved in the distribution of X.

It isn't advisable to assume a random variable is normal when this assumptions makes the mathematics of a problem harder. Many variables in scientific experiments are obviously not normally distributed (for example, you may know that they cannot take on negative values or you may know that there is some reasonable finite upper bound for them).

If you make enough assumptions that (by change of variables) your ratio distribution becomes a Cauchy distribution, it's mean and variance won't exist. I haven't looked, but I'm sure there is ample literature on doing statistical tests for the Cauchy. However, if you are dealing with something like cell counts, the assumptions that lead to a Cauchy distribution don't seem plausible.
 
Last edited:
  • #4
Stephen Tashi;

Should your reply be directed to LeotheLion?

I agree with your comment either way.
 
  • #5
kdbnlin78 said:
Stephen Tashi;

Should your reply be directed to LeotheLion?

I agree with your comment either way.

You're right. I'll edit it.
 
  • #6
Thanks for your replies!

To clarify (please excuse my sloppy notation):

If, in your definition, X and Y are independent random vaiables you could consider their difference. Many many distributions of the type X-Y may show signs of normality. In this case if X has mean /bar{x}, variance /sigma^{2}_{x} and Y has mean /bar{y}, variance /sigma^{2}_{y} then you may find X-Y /tilde $N(/bar{x} - /bar{y}, /sigma^{2}_{x} + /sigma^{2}_{y})$.

This is true. However, I do not think X-Y is normal. After doing some more searching on google, X and Y are Cauchy distributions. I was wondering if there was a test (along the lines of t-test or F-test) for this specific Cauchy distribution.

leothelion,

You aren't expressing what you want to do clearly. I think what you are trying to say is that "I want to test the hypothesis that the the parameters of the two distributions are equal at a given level of statistical significance".

You could test the hypothesis that the two distributions are the same using a non-parameteric test like the Wilcoxon if you think a difference in the distributions implies one tends to produce larger ratios than the other. (It isn't clear that it does from what you said.)

You didn't say whether the random variables involved in X are independent of each other. You didn't say what parameter or parameters are involved in the distribution of X.

It isn't advisable to assume a random variable is normal when this assumptions makes the mathematics of a problem harder. Many variables in scientific experiments are obviously not normally distributed (for example, you may know that they cannot take on negative values or you may know that there is some reasonable finite upper bound for them).

If you make enough assumptions that (by change of variables) your ratio distribution becomes a Cauchy distribution, it's mean and variance won't exist. I haven't looked, but I'm sure there is ample literature on doing statistical tests for the Cauchy. However, if you are dealing with something like cell counts, the assumptions that lead to a Cauchy distribution don't seem plausible.

I am measuring the proliferation of cell population A in two cases: cell A alone, or cell A with the addition of suppressor cell B. To measure the effectiveness of the suppressor cell, what I am doing is dividing the amount of proliferation of cell A alone by the amount of proliferation of cell A + cell B.

Therefore,

cell A: x_1, x_2, x_3
cell A + B: x_4, x_5, x_6

I want to compare this to the proliferation of A or A + B, but with the addition of another factor C, i.e. A + C versus A + B + C.

cell A + factor C: y_1, y_2, y_3
cell A + B + factor C: y_4, y_5, y_6

I am assuming that x_1, x_2, x_3 are normally distributed to each other with mean xbar and variance sigma^2. They may be related to (x_4, x_5, x_6); (y_1, y_2, y_3); (y_4, y_5, y_6), e.g. x_4, x_5, x_6 will be less than x_1, x_2, x_3.

Hypotheses:
H0: X and Y are not different
Ha: X and Y are different


I would use a nonparametric like Wilcoxon's, but the experiments are not cheap to perform so it would be hard to get the number of trials required for signifiance. The data look convincing, so I am wondering if there is an established test of significance for these particular Cauchy distributions.


I will try to do some literature searching as you have suggested. Thanks!
 
  • #7
leothelion said:
Therefore,

cell A: x_1, x_2, x_3
cell A + B: x_4, x_5, x_6

I don't know why you say "Therefore". It isn't at all clear what x_1, x_2 and x_3 are. Are these counts of cell A from 3 different trials?

Likewise it isn't clear why there are 3 groups of the other variables.

It would help if defined "proliferation". Why are you measuring it as a ratio? Is it defined as a ratio? The ordinary use of "proliferation" would suggest a spreading or increase of something. I don't see why an increase would be measured as a ratio. Is it suppose to mean something like a percentage increase?

I am assuming that x_1, x_2, x_3 are normally distributed to each other with mean xbar and variance sigma^2.

What does it mean for random variables to be "distributed to each other"? As I said before, if the x's represent counts or some quantity that cannot be negative, it's clear that they aren't really normally distributed.
 
  • #8
Stephen Tashi said:
I don't know why you say "Therefore". It isn't at all clear what x_1, x_2 and x_3 are. Are these counts of cell A from 3 different trials?

Likewise it isn't clear why there are 3 groups of the other variables.

It would help if defined "proliferation". Why are you measuring it as a ratio? Is it defined as a ratio? The ordinary use of "proliferation" would suggest a spreading or increase of something. I don't see why an increase would be measured as a ratio. Is it suppose to mean something like a percentage increase?
Instead of "therefore", take it as "In summary,"

Let me help clarify the rest:

I am measuring proliferation as the average number of times that the cells divide. We track cell division by labeling the cells with a dye called CFSE. Parent cells will have 100% CFSE levels, 1st generation 50%, 2nd generation 25%, etc. I can backcalculate using these values to determine the average number of times the cells have divided.

The reason why we have 4 groups of variables is because they are four different treatment groups. Let me redefine them below:

Group 1: starting population of 1 million type A cells
Group 2: starting population of 1 million type A cells + 1 million type B cells

Type B cells are known to be suppressor cells--that is, if you culture A together with B, then A doesn't proliferate as much (fewer average number of divisions).

We're interested to see if adding factor X reduces the suppressive ability of B. That's why we're testing Group 3 and Group 4.

Group 3: 1 million type A cells + factor X
Group 4: 1 million A + 1 million B + factor X

Notice that Group 3 and 4 are related to Group 1 and 2, but have an additional factor X.

We have already performed the experiment three times, giving three samples for each of the groups.

What does it mean for random variables to be "distributed to each other"? As I said before, if the x's represent counts or some quantity that cannot be negative, it's clear that they aren't really normally distributed.

I meant: x_1, x_2, x_3 are samples from the normal distribution X. Perhaps they are not normally distributed, as the average number of times a cell divides is at least 0. However, I was thinking back to an example from my intro to statistics class about bottles of coke. That is, they measured the amount of coke in each bottle and determined a mean and variance that was normally distributed. If we are doing measurements, would the central limit theorem apply and make the data normally distributed?
 
Last edited:
  • #9
You haven't stated precisely what the variables are yet. If you start with one million cells of type A on trial 1 and got "the answer" x_1 for that trial, what is x_1? If it is a "number of divisions", is it exactly the same number for each of the million cells? Or are you computing x_1 as an average of all the numbers of divisions for all the cells that are alive at the end of the trial? Or is x_1 a measure of some chemical concentration that is assumed to indicate the average number of divisions for all the cells that are alive?

If x_1 is a sample of some sort of average quantity and it is the result of many small independent events adding up, then the Central Limit Theorem would imply it was drawn from an approximately normal distribution. If x_1 is not an average, then the Central Limit Theorem doesn't help you. For example, if the random variable V has a triangle shaped probabiltiy density and you drawn many many samples from it and plot the histogram, the histogram will probably be triangle shaped, not shaped like a normal distribution.

I still don't see why you want to compute the ratio [itex] X = \frac {x_1 + x_2 + x_3}{x_4 + x_5 + x_6} [/itex]. Why is that a good idea?
 
  • #10
Stephen Tashi said:
You haven't stated precisely what the variables are yet. If you start with one million cells of type A on trial 1 and got "the answer" x_1 for that trial, what is x_1? If it is a "number of divisions", is it exactly the same number for each of the million cells? Or are you computing x_1 as an average of all the numbers of divisions for all the cells that are alive at the end of the trial? Or is x_1 a measure of some chemical concentration that is assumed to indicate the average number of divisions for all the cells that are alive?

Let me give you an example data set.

Cell A only
G1 (parent): 8.89%
G2: 8.47%
G3: 14.94%
G4: 23.85%
G5: 24.27%
G6: 15.66%
G7: 4.12%

The proliferation index (average number of divisions) is calculated as sum i*Gi / 100% = 4.57. So if we started with 1 million cells, the final population is about 4.57 million. But to make things more consistent, we just use the proliferation index 4.57.

Cell A + B
G1 (Parent): 51.47% of population
G2: 19.37%
G3: 15.15%
G4: 9.88%
G5: 4.12%

The proliferation index (average number of divisions) is sum i*Gi / 100 = 1.51.

If x_1 is a sample of some sort of average quantity and it is the result of many small independent events adding up, then the Central Limit Theorem would imply it was drawn from an approximately normal distribution. If x_1 is not an average, then the Central Limit Theorem doesn't help you. For example, if the random variable V has a triangle shaped probabiltiy density and you drawn many many samples from it and plot the histogram, the histogram will probably be triangle shaped, not shaped like a normal distribution.

I still don't see why you want to compute the ratio [itex] X = \frac {x_1 + x_2 + x_3}{x_4 + x_5 + x_6} [/itex]. Why is that a good idea?

The reason why we calculate the ratio is to evaluate the effectiveness of suppressor cell type B. For example, consider the following data:

Group 1 (A only): 4.57
Group 2 (A + B): 1.51

Group 3 (A + factor C): 5.82
Group 4 (A + B + factor C): 3.56

If we compare groups 1 and 2, B is able to reduce proliferation by (4.57 - 1.51) / 4.57 = 66.2%.

If we compare groups 3 and 4, B is able to reduce proliferation by (5.82 - 3.02) / 5.82 = 38.8%.

Thus, although addition of factor C can increase the proliferation of cells A overall, it also has the effect of reducing the effect of B. Upon the addition of factor C, B is only 38.8% / 66.2% = 58.6% as effective as without factor C. One possible suggestion is that it may be easier to subtract the two instead, e.g.

Groups 1 and 2: 4.57 - 1.51 = 3.02

so B is able to reduce proliferation by about 3.02 divisions per cell

whereas Groups 3 and 4: 5.82 - 3.56 = 2.26

so B is only able to reduce proliferation by 2.26, when factor C is also present in the culture.

However, we have found that the ratios (e.g. (4.57 - 1.51) / 4.57 = .662) calculated above are more consistent across trials.
 
Last edited:
  • #11
You explained a lot but you didn't manage to explain what x_1 is. So I have to guess about the x's. Ok, I'll guess.

Lets say there are 3 trials and the proliferation indices for "A only" on the 3 trials are respectively [itex] x_1, x_2, x_3 [/itex] Let's say the proliferation indices for "A with B" on the 3 trials are respectively [itex] x_4,x_5,x_6 [/itex].

The average of the ratios of the proliferation indices would be [itex] (\frac{x_1}{x_4} + \frac{x_2}{x_5} + \frac{x_3}{x_6})/3 [/itex].

What is the quantity [itex] X = \frac{x_1+x_2+x_3}{x_4+x_5+x_6} [/itex] supposed to be?

I gather the "G's" are "generations". Are the members of G1, the cells that never divided? Are the members of G2 the two cells that came from a cell that divided once? What is best description of the mechanism by which the various factors slow down cell divison. Do they absolutely prevent cell division in a certain fraction of the cells? Or do they slow down the physical process of the division itself?
 
Last edited:
  • #12
We take the average of cells with A only: x1 + x2 + x3 / 3

Then we take the average of cells with A + B: x4 + x5 + x6 / 3

So the ratio A+B / A = (x4 + x5 + x6) / (x1 + x2 + x3)

1 - (x4 + x5 + x6) / (x1 + x2 + x3) = the reduction in proliferation.Yes, G's are generations.

G1 is the parent population that never divided. Let's say we count 50 cells in G1
G2 is the population that divided once. If we count 50 cells that divided once, then G2 = 25 because they came from 25 parent cells.
G3 is the population that divided twice. If we count 60 cells that divided twice, then G3 = 60 / 4 = 15.The mechanism of preventing division is not completely understood. I would guess that it would be something like this: Let's say there are 100 A cells. Under normal conditions, 60 will enter the cell cycle and undergo division. With B suppressor cells, only 20 type A cells will enter the cell cycle. Letting the cell culture go on for longer period of time will not change the number of type A cells that proliferate, so the time-kinetics is not a major issue.
 
  • #13
leothelion said:
We take the average of cells with A only: x1 + x2 + x3 / 3

Then we take the average of cells with A + B: x4 + x5 + x6 / 3

So the ratio A+B / A = (x4 + x5 + x6) / (x1 + x2 + x3)

.

OK, I understand what you compute. However, if you seriously believe that the the numerator and denominator of this expression are each normal random variables, then you have computed a statistic that does not have a finite variance. So I don't understand the remark that this statistic is "stable" over trials.

I did a search on the words: cell proliferation index statistical tests. I found many papers, but most of them required a journal subscription to read. A quick scan of those that I could read turned up examples where people used the Wilcoxon or Student T tests to test the equality of cell proliferation indices among populations. As far as I could determine, they did not use the ratio you have defined. Do you know of any papers that do use the ratio?

(You haven't said whether the 3 trials are "paired" in any way. Was trial 1 for "A" conducted on the same day as the first trial for "A+B"? Is there any common influence in the two trial 1's? )

The mechanism of preventing division is not completely understood. I would guess that it would be something like this: Let's say there are 100 A cells. Under normal conditions, 60 will enter the cell cycle and undergo division. With B suppressor cells, only 20 type A cells will enter the cell cycle. Letting the cell culture go on for longer period of time will not change the number of type A cells that proliferate, so the time-kinetics is not a major issue.

To me ( not from reading any papers) , this suggests the following approach. Assume when a cell is put in an environment (such as "A") that it has a probability q of never dividing. If a cell does divide into two daughter cells then assume each daughter cell has , independently of the other, a probability q of never dividing. This makes the problem involve estimating the parameters, p = 1 - q and q of a binomial distribution. In a single trial, each pair of consecutive generations gives data for estimating the parameters since I think you can deduce what fraction of a generation fails to divide.
 

1. What is a normal distribution?

A normal distribution is a probability distribution that is symmetric and bell-shaped. It is a continuous distribution that is often used to model real-world phenomena, such as heights or IQ scores, as it is a good approximation of many natural phenomena.

2. How is a normal distribution divided by another normal distribution?

A normal distribution divided by another normal distribution is calculated by dividing the mean of one distribution by the mean of the other distribution, and dividing the standard deviation of one distribution by the standard deviation of the other distribution.

3. What is the purpose of comparing normal distribution divided by normal distribution?

Comparing normal distribution divided by normal distribution allows us to analyze the relationship between two sets of data and determine if they have similar or different distributions. It can also help us identify any outliers or unusual patterns in the data.

4. How is the comparison of normal distribution divided by normal distribution useful in research?

In research, comparing normal distribution divided by normal distribution can help us identify any significant differences between two groups or variables. It can also help us determine if the data follows a normal distribution, which is important in statistical analysis and hypothesis testing.

5. What are some limitations of comparing normal distribution divided by normal distribution?

One limitation is that it assumes both distributions are normally distributed, which may not always be the case. It also does not take into account any other factors that may affect the data, such as sample size or outliers. Additionally, it may not be appropriate for comparing distributions with different shapes or variances.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
438
  • Set Theory, Logic, Probability, Statistics
Replies
30
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
886
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
8
Views
675
  • Set Theory, Logic, Probability, Statistics
Replies
9
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
946
  • Set Theory, Logic, Probability, Statistics
Replies
9
Views
3K
Back
Top