- #1
Jarfi
- 384
- 12
- TL;DR Summary
- How can I compare different samples where the lowest possible value is 0, highest is infinite, but the SD becomes very low close to zero in order to get p-values that are not biased towards low expression sets f.ex. (0.1, 2, 0.4, 5, 2) vs (59, 79, 80, 81, 55, 82, 64).
Obviously the first has more proportional error, but it will ALWAYS get a higher p-value when compared to almost any other Mean that is higher than 10.. because the SD is much much lower and the difference may be equal.
Excuse me, I am not too well versed in statistics. I am in engineering.
Let's say I have an expected measurements(grand mean of values) and then I take another measurement of two different samples. Each sample is measured a few times to get it's own standard deviation and expected value.
I want to compare the result to the grand mean.
I do a Welch's test to get a t-value, so basically it goes something like this
WELCH T TEST, we assume the grand mean has 60 samples, and each our individual measurements(Meansample is made out of 6 samples)
(Meanall - Meansample) / √(s2all/n1+s2sample/n2)=(Meanall - Meansample) / √(saccumulative)
The problem here is that I am getting WAY WAY higher t values, and thus, p values for my samples with low concentration/expression.
Lets assume that the grand mean of ALL the samples in existence is around 350
Say one sample(SAMPLE1) has 700 expression, the SD may be something like 50.
The other sample(SAMPLE2) has expression level 3, almost nothing. And the SD is going to be around 2, this is very much lower, however the PERCENT ERROR is much much higher, at 67%.
But when we do the Welch test, this percent error is simply not detected. And it does not account for low value error bias.
SAMPLE1:
difference=2-350=-348
s2sample/n2 = 2*2/6=0.67
SAMPLE2:
difference=700-350=350
s2sample/n2 = 50/6=8.3
s2all/n1 remains the same for both cases, as it is the grand mean...
As can be seen, the difference is the same. Sample2 has almost no expression, but has a very large proportional error(natural for low expression values), but due to this phenomena, it will end up with a very massive negative t value.
For sample one, we have a high SD, but not that high percent error, one would expect this should give a similar p value if not higher than for SAMPLE1. But really, it is extremely difficult to say because sometimes low expression is due to limits of the measurement technology.
At low levels of expression, close to zero, the SD becomes very low(1-3) however, the percent error becomes very high. At high levels of expression, SD becomes modestly high, but the percent error becomes low.
I want to make both cases comparable to a grand mean. Or to each other and one another, you get the point. But I don't want the low expression to give me massive p-values because that would cause bias where all my "inferences" would simply be cases where no expression was found.
I don't need the PERFECT solution, just something that accounts for this, is the binomial distribution a good choice? I'm not sure how exactly to attack this issue, is there some sort of standard score that uses proportional error rather than SD to get a p-value?
Let's say I have an expected measurements(grand mean of values) and then I take another measurement of two different samples. Each sample is measured a few times to get it's own standard deviation and expected value.
I want to compare the result to the grand mean.
I do a Welch's test to get a t-value, so basically it goes something like this
WELCH T TEST, we assume the grand mean has 60 samples, and each our individual measurements(Meansample is made out of 6 samples)
(Meanall - Meansample) / √(s2all/n1+s2sample/n2)=(Meanall - Meansample) / √(saccumulative)
The problem here is that I am getting WAY WAY higher t values, and thus, p values for my samples with low concentration/expression.
Lets assume that the grand mean of ALL the samples in existence is around 350
Say one sample(SAMPLE1) has 700 expression, the SD may be something like 50.
The other sample(SAMPLE2) has expression level 3, almost nothing. And the SD is going to be around 2, this is very much lower, however the PERCENT ERROR is much much higher, at 67%.
But when we do the Welch test, this percent error is simply not detected. And it does not account for low value error bias.
SAMPLE1:
difference=2-350=-348
s2sample/n2 = 2*2/6=0.67
SAMPLE2:
difference=700-350=350
s2sample/n2 = 50/6=8.3
s2all/n1 remains the same for both cases, as it is the grand mean...
As can be seen, the difference is the same. Sample2 has almost no expression, but has a very large proportional error(natural for low expression values), but due to this phenomena, it will end up with a very massive negative t value.
For sample one, we have a high SD, but not that high percent error, one would expect this should give a similar p value if not higher than for SAMPLE1. But really, it is extremely difficult to say because sometimes low expression is due to limits of the measurement technology.
At low levels of expression, close to zero, the SD becomes very low(1-3) however, the percent error becomes very high. At high levels of expression, SD becomes modestly high, but the percent error becomes low.
I want to make both cases comparable to a grand mean. Or to each other and one another, you get the point. But I don't want the low expression to give me massive p-values because that would cause bias where all my "inferences" would simply be cases where no expression was found.
I don't need the PERFECT solution, just something that accounts for this, is the binomial distribution a good choice? I'm not sure how exactly to attack this issue, is there some sort of standard score that uses proportional error rather than SD to get a p-value?