# A Low value standard deviation bias

#### Jarfi

Summary
How can I compare different samples where the lowest possible value is 0, highest is infinite, but the SD becomes very low close to zero in order to get p-values that are not biased towards low expression sets f.ex. (0.1, 2, 0.4, 5, 2) vs (59, 79, 80, 81, 55, 82, 64).

Obviously the first has more proportional error, but it will ALWAYS get a higher p-value when compared to almost any other Mean that is higher than 10.. because the SD is much much lower and the difference may be equal.
Excuse me, I am not too well versed in statistics. I am in engineering.

Let's say I have an expected measurements(grand mean of values) and then I take another measurement of two different samples. Each sample is measured a few times to get it's own standard deviation and expected value.

I want to compare the result to the grand mean.

I do a Welch's test to get a t-value, so basically it goes something like this

WELCH T TEST, we assume the grand mean has 60 samples, and each our individual measurements(Meansample is made out of 6 samples)
(Meanall - Meansample) / √(s2all/n1+s2sample/n2)=(Meanall - Meansample) / √(saccumulative)
The problem here is that I am getting WAY WAY higher t values, and thus, p values for my samples with low concentration/expression.

Lets assume that the grand mean of ALL the samples in existence is around 350

Say one sample(SAMPLE1) has 700 expression, the SD may be something like 50.

The other sample(SAMPLE2) has expression level 3, almost nothing. And the SD is going to be around 2, this is very much lower, however the PERCENT ERROR is much much higher, at 67%.

But when we do the Welch test, this percent error is simply not detected. And it does not account for low value error bias.

SAMPLE1:
difference=2-350=-348
s2sample/n2 = 2*2/6=0.67

SAMPLE2:
difference=700-350=350
s2sample/n2 = 50/6=8.3

s2all/n1 remains the same for both cases, as it is the grand mean...

As can be seen, the difference is the same. Sample2 has almost no expression, but has a very large proportional error(natural for low expression values), but due to this phenomena, it will end up with a very massive negative t value.
For sample one, we have a high SD, but not that high percent error, one would expect this should give a similar p value if not higher than for SAMPLE1. But really, it is extremely difficult to say because sometimes low expression is due to limits of the measurement technology.

At low levels of expression, close to zero, the SD becomes very low(1-3) however, the percent error becomes very high. At high levels of expression, SD becomes modestly high, but the percent error becomes low.

I want to make both cases comparable to a grand mean. Or to each other and one another, you get the point. But I don't want the low expression to give me massive p-values because that would cause bias where all my "inferences" would simply be cases where no expression was found.

I dont need the PERFECT solution, just something that accounts for this, is the binomial distribution a good choice? I'm not sure how exactly to attack this issue, is there some sort of standard score that uses proportional error rather than SD to get a p-value?

Related Set Theory, Logic, Probability, Statistics News on Phys.org

#### Stephen Tashi

that are not biased towards low expression sets f.ex. (0.1, 2, 0.4, 5, 2) vs (59, 79, 80, 81, 55, 82, 64).
What do mean by the term "expression"? You used this word several times, but I don't recognize it as a standard term in statistics.

Does "low expression set" refer to a small sample size?

#### Jarfi

What do mean by the term "expression"? You used this word several times, but I don't recognize it as a standard term in statistics.

Does "low expression set" refer to a small sample size?
It refers to a set/group of samples with low values. All the groups have the same sample size

#### Stephen Tashi

As I understand the situation, you feel that a sample with low values has more chance of being drawn from a population than the distribution of the Welch statistic indicates. To a casual reader, not knowing the physics of your problem, it isn't surprising that a sample of low values is improbable if we are selecting from a population with a large mean. You need to explain more of the physical situation to convey your reasoning.

Your use of the term "grand mean" suggests that you are comparing individual samples to a "grand sample" consisting of all the individual samples combined. This violates the assumption that the grand sample and the individual samples are independently selected. I think the Welch test assumes the two samples are independent.

In statistics, a "statistic" is an algorithm that is applied to data from a sample. A "statistical test" involves a particular statistic and a particular distribution for that statistic, considered as a random variable. If the assumptions for the test are violated, the distribution of the statistic may not be what the test assumes. This does not imply that the statistic is useless. It only implies that you must determine what distribution the statistic has under the conditions you have. If you are a skilled programmer and have a specific probability model, you can determine the distribution by Monte Carlo simulation.

Even if you don't intend to do a simulation, a specific answer to a question in statistics depends on having a specific probability model for how the data is generated.

On model ("null hypothesis") is that all the samples are drawn from the same normal distribution. The fact that samples have different sample means and standard deviations is due to the randomness of the draws. This hypothesis differs from a model where the individual samples are drawn from different random variables having a common mean but different standard deviations.

What is a probability model that implements your comment that "sometimes low expression is due to limits of the measurement technology."?

#### Jarfi

As I understand the situation, you feel that a sample with low values has more chance of being drawn from a population than the distribution of the Welch statistic indicates. To a casual reader, not knowing the physics of your problem, it isn't surprising that a sample of low values is improbable if we are selecting from a population with a large mean. You need to explain more of the physical situation to convey your reasoning.

Your use of the term "grand mean" suggests that you are comparing individual samples to a "grand sample" consisting of all the individual samples combined. This violates the assumption that the grand sample and the individual samples are independently selected. I think the Welch test assumes the two samples are independent.

In statistics, a "statistic" is an algorithm that is applied to data from a sample. A "statistical test" involves a particular statistic and a particular distribution for that statistic, considered as a random variable. If the assumptions for the test are violated, the distribution of the statistic may not be what the test assumes. This does not imply that the statistic is useless. It only implies that you must determine what distribution the statistic has under the conditions you have. If you are a skilled programmer and have a specific probability model, you can determine the distribution by Monte Carlo simulation.

Even if you don't intend to do a simulation, a specific answer to a question in statistics depends on having a specific probability model for how the data is generated.

On model ("null hypothesis") is that all the samples are drawn from the same normal distribution. The fact that samples have different sample means and standard deviations is due to the randomness of the draws. This hypothesis differs from a model where the individual samples are drawn from different random variables having a common mean but different standard deviations.

What is a probability model that implements your comment that "sometimes low expression is due to limits of the measurement technology."?
Actually, the samples do not come from the same distribution, that is they always have different variance. Thus a standard t-test is unideal. I am not sure what the underlying distribution is, I think Poisson distribution or Negative binomial, and considering that even the t-test and welch's tests are incorrect, because they are trying to approximate the normal distribution.

I have many groups, and I don't want to compare only 2 at a time, that's why I was trying to compare each one to the grand mean, similar to z-value normalization(in heat maps). I know about ANOVA, but that does not allow me to see the relative power of each group compared to the rest of the groups. It is not powerful enough.

The probability model is that when we measure 0 or 1, that does not imply a 0 or 1 unit of concentration inside the sample. It just implies there is a threshold, a cutoff of minimum concentration in order for the sensor to get a read. The reads are quantized into bits, 1, 2, 3... up to infinity, but 0 is the lowest MEASURABLE concentration, but not 0 presence of the concentration(lets say we are measuring nucleotides in mg/L f.ex.) in reality, 0 could coarrelate to 20mg/L AND it could coarrelate to 5mg/L. But 1 would strictly be 20-30mg/L assuming a scale of 10mg/L per bit.

And sorry I was wrong, the groups do NOT all have equal sample size, some groups have 12 samples. And this further complicates analysis even more, because obviously they will have lower SD, and such, so doing a z score compared to the grand mean might be unideal.

#### Stephen Tashi

, but that does not allow me to see the relative power of each group compared to the rest of the groups.
The "power" of a statistical hypothesis test has a technical definition. I assume that's not what you are talking about.

The two major branches of statistics are hypothesis testing and estimation. Is you goal to make a yes-or-no decision about some questions? - or are you trying to quantify how different several populations are from each other?

Seeking mathematical solutions to statistical questions is interesting, but we must acknowledge other considerations - tradition and bureaucracy. Statistics is subjective. If you intend to publish your work there may be traditional ways of analyzing data that a journal expects. In that case, see what other published authors have done in similar situations. ... now, back to what is interesting:

The probability model is that when we measure 0 or 1, that does not imply a 0 or 1 unit of concentration inside the sample. It just implies there is a threshold, a cutoff of minimum concentration in order for the sensor to get a read. The reads are quantized into bits, 1, 2, 3... up to infinity, but 0 is the lowest MEASURABLE concentration, but not 0 presence of the concentration(lets say we are measuring nucleotides in mg/L f.ex.) in reality, 0 could coarrelate to 20mg/L AND it could coarrelate to 5mg/L. But 1 would strictly be 20-30mg/L assuming a scale of 10mg/L per bit.

And sorry I was wrong, the groups do NOT all have equal sample size, some groups have 12 samples. And this further complicates analysis even more, because obviously they will have lower SD, and such, so doing a z score compared to the grand mean might be unideal.
That's useful information. The first question is whether concentration of nucleotides is the main topic of research or whether some other property that is inferred from concentrations is actually the main topic.

I am not sure what the underlying distribution is,
In any real life problem there will several different distributions involved. For example the distribution of individual measurements in a sample is not the same distribution as the distribution of the sample mean. "Underlying distribution" is a good phrase. The problem will be clearer if you avoid thinking of it as something that involves only one distribution.

For example, under given conditions the concentration might be modeled by a continuous random variable $X_1$. Truncating the measurement produces a discrete random variable $X_2$. Taking 12 samples from $X_2$ produces a 12 component random vector $X_3$. Computing the sample variance of those 12 components produces another random variable $X_4$ etc.

What is a family of probability distributions that are reasonable models for $X_1$? If we are willing to make use of Monte Carlo simulation, we need not restrict ourselves to "nice" families. For example, suppose physics says that concentrations must be between 0 and some max amount such as 500 mg/l. Then we could pick a family of distributions consisting of lognormal distributions that are truncated to be in [0,500] and normalized accordingly. However, if you want to publish this analysis, you need to investigate whether such a procedure would raise eyebrows.

#### FactChecker

Gold Member
2018 Award
Summary: How can I compare different samples where the lowest possible value is 0, highest is infinite, but the SD becomes very low close to zero in order to get p-values that are not biased towards low expression sets f.ex. (0.1, 2, 0.4, 5, 2) vs (59, 79, 80, 81, 55, 82, 64).

Obviously the first has more proportional error, but it will ALWAYS get a higher p-value when compared to almost any other Mean that is higher than 10.. because the SD is much much lower and the difference may be equal.
Suppose that your interest is in the proportion of error. That means that the errors are a multiplying factor of the data. You should begin by taking the logarithm of the data. Then the errors are an added term, which is the standard situation. You should be able to compare the data sets on a better basis that way.