- #1
- 177
- 24
Hi everyone, I have a question about the error on certain biological results.
Apologies for the long explanation, but it's not a trivial thing (at least, it isn't for me).
This is what happens:
1. biologists take N solutions of a molecule at different concentrations (say from 10-10 to 10-5 molar) and add them to 2 N equal 'wells' (there is a duplicate for each concentration)
2. the biological 'response' R is measured in each well, normalised (e.g. %R = (R-Rmin)/(Rmax-Rmin) ) and %R is plotted versus the log10(concentration).
3. the data are fitted non-linearly using a variation of the Hill equation ( http://en.wikipedia.org/wiki/Hill_equation_(biochemistry) ); it's a 'sigmoidal' curve that seems quite closely related to the logistic curve. I think the least-squares fit is used but I'm not 100% sure.
4. the parameters that are returned are the so-called 'potency', which mathematically corresponds to the abscissa of the inflection point (KA in the Wiki article), the Hill slope (n) and the maximal %R (because Rmax is often defined based on a standard, independently from the individual molecule tested, so the asymptotic %R at the right plateau of the curve can often be different from 100%, greater or smaller).
Trouble is, these data are very often given without any indication of the standard error on them.
This may sound like a minor issue, but from my point of view (the chemist who made the molecules and wants to know how good they are) it's not.
Because for instance one day I can get a very good potency for a certain molecule, I make decisions on what to do next, and when the molecule is retested for confirmation after a few weeks, I find that the potency has become much worse, and my plans were all wrong.
If I had been given an idea of the error, I could have decided what confidence to put on each result, and crucially, how 'different' two potencies actually were.
So, suppose you have the data.
Let xi=log10(conci): you have 2 N pairs [x1, %R1_1], [x1, %R1_2], [x2, %R2_1], etc.
You fit them non-linearly to an equation like: %R = %Rmax / (1+ 10(K-x)) .
First question. In one run of the assay M molecules are tested, generating 2 M N data points. Is it possible to calculate the standard error on K and %Rmax for each tested molecule? And if so, should this be done based on the data for each individual molecule, or after pooling together the duplicate variability of all the M molecules?
Second (and last) question. The assay is often repeated multiple times on certain molecules. So in the end there will be many sets of [K, %Rmax] data for each molecule. Can the 'standard deviation' or the assay itself be calculated from these data, as if they were direct, repeated measurements?
Again, sorry for the long post. Not a question that's easy to ask briefly.
Thank you!
L
Apologies for the long explanation, but it's not a trivial thing (at least, it isn't for me).
This is what happens:
1. biologists take N solutions of a molecule at different concentrations (say from 10-10 to 10-5 molar) and add them to 2 N equal 'wells' (there is a duplicate for each concentration)
2. the biological 'response' R is measured in each well, normalised (e.g. %R = (R-Rmin)/(Rmax-Rmin) ) and %R is plotted versus the log10(concentration).
3. the data are fitted non-linearly using a variation of the Hill equation ( http://en.wikipedia.org/wiki/Hill_equation_(biochemistry) ); it's a 'sigmoidal' curve that seems quite closely related to the logistic curve. I think the least-squares fit is used but I'm not 100% sure.
4. the parameters that are returned are the so-called 'potency', which mathematically corresponds to the abscissa of the inflection point (KA in the Wiki article), the Hill slope (n) and the maximal %R (because Rmax is often defined based on a standard, independently from the individual molecule tested, so the asymptotic %R at the right plateau of the curve can often be different from 100%, greater or smaller).
Trouble is, these data are very often given without any indication of the standard error on them.
This may sound like a minor issue, but from my point of view (the chemist who made the molecules and wants to know how good they are) it's not.
Because for instance one day I can get a very good potency for a certain molecule, I make decisions on what to do next, and when the molecule is retested for confirmation after a few weeks, I find that the potency has become much worse, and my plans were all wrong.
If I had been given an idea of the error, I could have decided what confidence to put on each result, and crucially, how 'different' two potencies actually were.
So, suppose you have the data.
Let xi=log10(conci): you have 2 N pairs [x1, %R1_1], [x1, %R1_2], [x2, %R2_1], etc.
You fit them non-linearly to an equation like: %R = %Rmax / (1+ 10(K-x)) .
First question. In one run of the assay M molecules are tested, generating 2 M N data points. Is it possible to calculate the standard error on K and %Rmax for each tested molecule? And if so, should this be done based on the data for each individual molecule, or after pooling together the duplicate variability of all the M molecules?
Second (and last) question. The assay is often repeated multiple times on certain molecules. So in the end there will be many sets of [K, %Rmax] data for each molecule. Can the 'standard deviation' or the assay itself be calculated from these data, as if they were direct, repeated measurements?
Again, sorry for the long post. Not a question that's easy to ask briefly.
Thank you!
L