Standard deviation for a biological assay

In summary, the conversation discusses the best method for estimating the standard deviation of a biological assay. The results are typically reported as the mean and standard error of the mean for each molecule, but this may not always be accurate. The suggested approach is to calculate the total standard deviation using all the measurements and then dividing by the square root of the number of repeats for each molecule. This method takes into account the variability of the measurements and provides a more accurate estimate of the standard deviation. Additionally, if the measurement error is proportional to the size of the molecule, it can be normalized by dividing by the mean measurement. Overall, this approach can improve the accuracy of reporting results for biological assays.
  • #1
lavoisier
177
24
Hi everyone, I have a basic question on statistics.

Suppose you have a biological assay to test a given property of some molecules.
At some point in time you have tested N different molecules M1, M2, ..., Mi, ..., MN, and for each you have repeated the test a number of times Ri. The results are Pi.
Example. You have tested N=4 molecules doing Ri={3,2,1,1} repeats. The results are Pi={{5.5, 6.0, 5.7}, {7.9, 8.1}, {6.3}, {8.5}}.
Now, in practice the way results are reported (at least where I work) is: for each Mi, mean and standard error of the mean based only on the actual repeats for each molecule.
Which means, for molecule 3 above the standard error of the mean is quite small, for molecule 1 it's larger and for the last 2 molecules it isn't calculated.
And even more worryingly, when by chance two repeats give exactly the same result, you find s.e.m.=0 !

I can't help thinking that this is wrong. But my knowledge of statistics is limited, that's why I'm asking for help here.

Doesn't a 'method', and therefore an assay, have an inherent standard deviation SD that is based on the total observed variance? Shouldn't we express all results based on this single, general SD?

So in practice, wouldn't it be more correct to express the results as:

P(Mi) = mean(Pi) +/- SD/sqrt(Ri)

This way, molecules that are tested only once would still get a standard error, and those for which repeats give close results wouldn't get an unreasonably small s.e.m.

Or am I wrong? What do you think?

Thanks
L
 
Physics news on Phys.org
  • #2
Of course you want to find a better way to estimate the standard deviation of your method.
However, if you don't have a benchmark on your method, and the four molecules are different, it would be inappropriate to lump all the measurements together to estimate the error.
If the results you have above are all you have, and the assumption is that all (or most) observed difference between measurements is due to measurement error, then you would estimate the total SD of measurement as:
##Var = \frac{\sum_{i=1}^N \sum_{j=1}^{R_i} (x_{ij} - \overline x_i ) ^2 }{\sum_{i=1}^N R_i - 1} ##
## SD = sqrt(Var)##
If you could assume that the molecules are from the same population, and may have the same mean...then building a SD estimate using all the values would work, but might obscure the measurement error.
 
  • Like
Likes lavoisier
  • #3
Thank you RUber! That's exactly what I needed to understand,
Indeed, I wasn't thinking of lumping all the molecules together into one mean. I probably misrepresented that with my notation.
What I didn't know was that I could calculate a variance by summing squared differences from different means. You clarified that.
As for the main source of error being the measurement itself rather than the molecule, that's a very good question. I would say it is in most cases. Only occasionally there will be molecules that interfere with the fundamental mechanism of the assay generating systematic errors or larger variability.
FYI, in the past another PF user sent me a link to a very interesting article that dealt with the calculation of the error of a specific type of assay, where several data points are collected and some parameters are fitted to a logistic-like equation. In that case the assay result we get is the regression parameter(s), which should come with an error (in the same way as the slope and intercept of a least squares fit can be given with their own error). No such luck - those who run the assay aren't happy to provide that piece of information.
So maybe the next best thing we can do as end users is adopt the above approach.
In any case, if I see that the distribution of the squared differences is not normal, I guess that may be an indication that the error is not random.
Thanks again.
L
 
  • #4
Glad to help.
One other thought, if you think that the measurement error might be proportional to the size, you could also normalize the error term by dividing by the mean measurement.
You would notice this if you plotted the (measured - mean) errors vs. the means. If the plot looks like a cone, you might assume that the error is not independent of the molecule size.
 
  • #5
I tried the method today. The differences (x-xmean) were normally distributed. I didn't try plotting them vs the means, I read this new post just now.
I adapted the formula a little; in particular, I subtracted from the denominator of Var the number of molecules that were tested only once.
If I left them in, the SD was way too low, because most molecules were indeed tested only once, so each of them contributed 0 to the numerator but 1 to the denominator.
Unless your formula was intended as:

[itex]Var = \frac{\sum_{i=1}^N \sum_{j=1}^{R_i} (x_{ij} - \overline x_i ) ^2 }{\sum_{i=1}^N (R_i - 1)} = \frac{\sum_{i=1}^N \sum_{j=1}^{R_i} (x_{ij} - \overline x_i ) ^2 }{-N+\sum_{i=1}^N R_i }[/itex]

I got SD = 0.22, which was not far from the mean of the SD's of the individual repeats.
Great! Tx
 
  • Like
Likes RUber
  • #6
Yes, sorry...I forgot the parentheses in the denominator. Your sharp analysis kept you from being led astray. Good work.
 

1. What is the purpose of calculating standard deviation for a biological assay?

The standard deviation for a biological assay is a measure of the variability or spread of the data. It helps to determine how close or far the individual data points are from the average or mean value. This information is important in assessing the reliability and precision of the assay results.

2. How is standard deviation calculated for a biological assay?

To calculate the standard deviation for a biological assay, first find the mean value of the data points. Then, subtract each data point from the mean and square the differences. Next, find the sum of all the squared differences and divide it by the total number of data points. Finally, take the square root of this value to get the standard deviation.

3. What factors can affect the standard deviation of a biological assay?

The standard deviation of a biological assay can be affected by various factors such as sample size, experimental conditions, and the precision of the measurement instrument. Additionally, biological variability and errors in experimental techniques can also contribute to the standard deviation.

4. How does standard deviation relate to the confidence interval of a biological assay?

The standard deviation is used to calculate the confidence interval of a biological assay. A narrower standard deviation indicates a smaller range of values for the data, resulting in a more precise confidence interval. Conversely, a wider standard deviation would result in a broader confidence interval, indicating less precision in the assay results.

5. What is an acceptable range for standard deviation in a biological assay?

There is no specific acceptable range for standard deviation in a biological assay as it can vary depending on the type of assay and the specific objectives of the research. However, a lower standard deviation generally indicates higher precision and accuracy in the assay results. It is important to compare the standard deviation to the mean value and consider the experimental conditions to determine if the results are reliable and reproducible.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
930
  • Set Theory, Logic, Probability, Statistics
Replies
6
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
24
Views
3K
  • Set Theory, Logic, Probability, Statistics
Replies
19
Views
5K
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
21
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
479
  • Set Theory, Logic, Probability, Statistics
Replies
13
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
2K
Back
Top