- #1

geo101

- 56

- 0

In the data set we have multiple specimens and each specimen yields multiple results. So in terms of random selection, if randomly selecting 1 result from specimen #1 has a probability P

_{1}of being an accurate result, if we randomly select 1 result from all specimen (P

_{1}, P

_{2}, P

_{3}, … P

_{n}, for n specimens), what is the probability (P

_{f}) of having an accurate result in this final data set?

Now suppose we select the data in some way (that we hope/think/pray will reject inaccurate results). At the specimen level the probabilities become P

_{1}′, P

_{2}′, P

_{3}′, … P

_{m}′, where m <= n. What this means, is that some specimens yield no acceptable results. The probability of having an accurate result in this final data set is now P

_{f}′.

How would we assess if our selection process is increasing our chances of obtaining an accurate result? What is the best way to compare P

_{f}and P

_{f}′ (or P

_{f}′ and P

_{f}′′ obtained from two different selection processes), and what factors should we consider?

This is where I get into a philosophical debate with one of my colleagues (neither of us are statisticians). His argument is that as long as Pf′ > Pf the data selection is an improvement. My argument is that the significance of the difference between Pf and Pf′ depends on m and that smaller differences require larger m to be important.

His view is that it doesn't matter what m, is as long as the final result is accurate. My opinion is, that this is only possible if Pf′ = 1 (i.e., we can reject all inaccurate results) and even then, only if this can be demonstrated to be universally true (I’m pretty sure that is impossible).

Also, I think more of a balance needs to be struck so as to avoid the situation whereby m is so small that the uncertainty of the final result (the average of the selected results) is so large that we cannot do anything meaningful with it.

As I mentioned, neither of us are statisticians, so some help and advice would be very welcome.

Cheers,

geo101