Standard deviation for a biological assay

lavoisier · Nov 19, 2015

Hi everyone, I have a basic question on statistics.

Suppose you have a biological assay to test a given property of some molecules.
At some point in time you have tested N different molecules M₁, M₂, ..., M_i, ..., M_N, and for each you have repeated the test a number of times R_i. The results are P_i.
Example. You have tested N=4 molecules doing R_i={3,2,1,1} repeats. The results are P_i={{5.5, 6.0, 5.7}, {7.9, 8.1}, {6.3}, {8.5}}.
Now, in practice the way results are reported (at least where I work) is: for each M_i, mean and standard error of the mean based only on the actual repeats for each molecule.
Which means, for molecule 3 above the standard error of the mean is quite small, for molecule 1 it's larger and for the last 2 molecules it isn't calculated.
And even more worryingly, when by chance two repeats give exactly the same result, you find s.e.m.=0 !

I can't help thinking that this is wrong. But my knowledge of statistics is limited, that's why I'm asking for help here.

Doesn't a 'method', and therefore an assay, have an inherent standard deviation SD that is based on the total observed variance? Shouldn't we express all results based on this single, general SD?

So in practice, wouldn't it be more correct to express the results as:

P(M_i) = mean(P_i) +/- SD/sqrt(R_i)

This way, molecules that are tested only once would still get a standard error, and those for which repeats give close results wouldn't get an unreasonably small s.e.m.

Or am I wrong? What do you think?

Thanks
L

RUber · Nov 19, 2015

Of course you want to find a better way to estimate the standard deviation of your method.
However, if you don't have a benchmark on your method, and the four molecules are different, it would be inappropriate to lump all the measurements together to estimate the error.
If the results you have above are all you have, and the assumption is that all (or most) observed difference between measurements is due to measurement error, then you would estimate the total SD of measurement as:
##Var = \frac{\sum_{i=1}^N \sum_{j=1}^{R_i} (x_{ij} - \overline x_i ) ^2 }{\sum_{i=1}^N R_i - 1} ##
## SD = sqrt(Var)##
If you could assume that the molecules are from the same population, and may have the same mean...then building a SD estimate using all the values would work, but might obscure the measurement error.

lavoisier · Nov 19, 2015

Thank you RUber! That's exactly what I needed to understand,
Indeed, I wasn't thinking of lumping all the molecules together into one mean. I probably misrepresented that with my notation.
What I didn't know was that I could calculate a variance by summing squared differences from different means. You clarified that.
As for the main source of error being the measurement itself rather than the molecule, that's a very good question. I would say it is in most cases. Only occasionally there will be molecules that interfere with the fundamental mechanism of the assay generating systematic errors or larger variability.
FYI, in the past another PF user sent me a link to a very interesting article that dealt with the calculation of the error of a specific type of assay, where several data points are collected and some parameters are fitted to a logistic-like equation. In that case the assay result we get is the regression parameter(s), which should come with an error (in the same way as the slope and intercept of a least squares fit can be given with their own error). No such luck - those who run the assay aren't happy to provide that piece of information.
So maybe the next best thing we can do as end users is adopt the above approach.
In any case, if I see that the distribution of the squared differences is not normal, I guess that may be an indication that the error is not random.
Thanks again.
L

RUber · Nov 19, 2015

Glad to help.
One other thought, if you think that the measurement error might be proportional to the size, you could also normalize the error term by dividing by the mean measurement.
You would notice this if you plotted the (measured - mean) errors vs. the means. If the plot looks like a cone, you might assume that the error is not independent of the molecule size.

lavoisier · Nov 20, 2015

I tried the method today. The differences (x-xmean) were normally distributed. I didn't try plotting them vs the means, I read this new post just now.
I adapted the formula a little; in particular, I subtracted from the denominator of Var the number of molecules that were tested only once.
If I left them in, the SD was way too low, because most molecules were indeed tested only once, so each of them contributed 0 to the numerator but 1 to the denominator.
Unless your formula was intended as:

[itex]Var = \frac{\sum_{i=1}^N \sum_{j=1}^{R_i} (x_{ij} - \overline x_i ) ^2 }{\sum_{i=1}^N (R_i - 1)} = \frac{\sum_{i=1}^N \sum_{j=1}^{R_i} (x_{ij} - \overline x_i ) ^2 }{-N+\sum_{i=1}^N R_i }[/itex]

I got SD = 0.22, which was not far from the mean of the SD's of the individual repeats.
Great! Tx

RUber · Nov 20, 2015

Yes, sorry...I forgot the parentheses in the denominator. Your sharp analysis kept you from being led astray. Good work.

Standard deviation for a biological assay

Graduate Expected numbers of cards of a last color remaining

Undergrad The problem of points

Graduate Probability puzzle

Undergrad The countability paradox of computable numbers

Undergrad How does axiom of foundation prevent infinite sequence of elements?

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Standard deviation for a biological assay

Similar threads