Standard deviation for a biological assay

Click For Summary
SUMMARY

The discussion centers on the calculation of standard deviation (SD) and standard error of the mean (s.e.m.) in biological assays involving multiple molecules. Participants emphasize the importance of using a comprehensive approach to estimate SD, particularly when molecules are tested with varying repetitions. The proposed formula for reporting results is P(Mi) = mean(Pi) +/- SD/sqrt(Ri), which ensures that even molecules tested once receive a standard error. The conversation highlights the need for accurate measurement error assessment and the potential pitfalls of lumping data from different molecules together.

PREREQUISITES
  • Understanding of standard deviation (SD) and standard error of the mean (s.e.m.)
  • Familiarity with biological assay methodologies
  • Basic statistical concepts, including variance calculation
  • Knowledge of data normalization techniques
NEXT STEPS
  • Research the calculation of variance in biological assays
  • Learn about the implications of measurement error in statistical analysis
  • Explore normalization techniques for error terms in assays
  • Study logistic regression and its application in biological data analysis
USEFUL FOR

Researchers, biostatisticians, and laboratory technicians involved in biological assays and statistical analysis of experimental data.

lavoisier
Messages
177
Reaction score
24
Hi everyone, I have a basic question on statistics.

Suppose you have a biological assay to test a given property of some molecules.
At some point in time you have tested N different molecules M1, M2, ..., Mi, ..., MN, and for each you have repeated the test a number of times Ri. The results are Pi.
Example. You have tested N=4 molecules doing Ri={3,2,1,1} repeats. The results are Pi={{5.5, 6.0, 5.7}, {7.9, 8.1}, {6.3}, {8.5}}.
Now, in practice the way results are reported (at least where I work) is: for each Mi, mean and standard error of the mean based only on the actual repeats for each molecule.
Which means, for molecule 3 above the standard error of the mean is quite small, for molecule 1 it's larger and for the last 2 molecules it isn't calculated.
And even more worryingly, when by chance two repeats give exactly the same result, you find s.e.m.=0 !

I can't help thinking that this is wrong. But my knowledge of statistics is limited, that's why I'm asking for help here.

Doesn't a 'method', and therefore an assay, have an inherent standard deviation SD that is based on the total observed variance? Shouldn't we express all results based on this single, general SD?

So in practice, wouldn't it be more correct to express the results as:

P(Mi) = mean(Pi) +/- SD/sqrt(Ri)

This way, molecules that are tested only once would still get a standard error, and those for which repeats give close results wouldn't get an unreasonably small s.e.m.

Or am I wrong? What do you think?

Thanks
L
 
Physics news on Phys.org
Of course you want to find a better way to estimate the standard deviation of your method.
However, if you don't have a benchmark on your method, and the four molecules are different, it would be inappropriate to lump all the measurements together to estimate the error.
If the results you have above are all you have, and the assumption is that all (or most) observed difference between measurements is due to measurement error, then you would estimate the total SD of measurement as:
##Var = \frac{\sum_{i=1}^N \sum_{j=1}^{R_i} (x_{ij} - \overline x_i ) ^2 }{\sum_{i=1}^N R_i - 1} ##
## SD = sqrt(Var)##
If you could assume that the molecules are from the same population, and may have the same mean...then building a SD estimate using all the values would work, but might obscure the measurement error.
 
  • Like
Likes   Reactions: lavoisier
Thank you RUber! That's exactly what I needed to understand,
Indeed, I wasn't thinking of lumping all the molecules together into one mean. I probably misrepresented that with my notation.
What I didn't know was that I could calculate a variance by summing squared differences from different means. You clarified that.
As for the main source of error being the measurement itself rather than the molecule, that's a very good question. I would say it is in most cases. Only occasionally there will be molecules that interfere with the fundamental mechanism of the assay generating systematic errors or larger variability.
FYI, in the past another PF user sent me a link to a very interesting article that dealt with the calculation of the error of a specific type of assay, where several data points are collected and some parameters are fitted to a logistic-like equation. In that case the assay result we get is the regression parameter(s), which should come with an error (in the same way as the slope and intercept of a least squares fit can be given with their own error). No such luck - those who run the assay aren't happy to provide that piece of information.
So maybe the next best thing we can do as end users is adopt the above approach.
In any case, if I see that the distribution of the squared differences is not normal, I guess that may be an indication that the error is not random.
Thanks again.
L
 
Glad to help.
One other thought, if you think that the measurement error might be proportional to the size, you could also normalize the error term by dividing by the mean measurement.
You would notice this if you plotted the (measured - mean) errors vs. the means. If the plot looks like a cone, you might assume that the error is not independent of the molecule size.
 
I tried the method today. The differences (x-xmean) were normally distributed. I didn't try plotting them vs the means, I read this new post just now.
I adapted the formula a little; in particular, I subtracted from the denominator of Var the number of molecules that were tested only once.
If I left them in, the SD was way too low, because most molecules were indeed tested only once, so each of them contributed 0 to the numerator but 1 to the denominator.
Unless your formula was intended as:

Var = \frac{\sum_{i=1}^N \sum_{j=1}^{R_i} (x_{ij} - \overline x_i ) ^2 }{\sum_{i=1}^N (R_i - 1)} = \frac{\sum_{i=1}^N \sum_{j=1}^{R_i} (x_{ij} - \overline x_i ) ^2 }{-N+\sum_{i=1}^N R_i }

I got SD = 0.22, which was not far from the mean of the SD's of the individual repeats.
Great! Tx
 
  • Like
Likes   Reactions: RUber
Yes, sorry...I forgot the parentheses in the denominator. Your sharp analysis kept you from being led astray. Good work.
 

Similar threads

  • · Replies 4 ·
Replies
4
Views
3K
  • · Replies 6 ·
Replies
6
Views
2K
  • · Replies 5 ·
Replies
5
Views
2K
  • · Replies 5 ·
Replies
5
Views
2K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 1 ·
Replies
1
Views
3K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 24 ·
Replies
24
Views
6K
  • · Replies 11 ·
Replies
11
Views
2K