Thank you guys. I apparently don't know how to multi quote automatically.
Holding it back because it is unexpected and you want to do additional tests to see if it was an error or correct is completely normal.
Say that you got an unexpected "point" on the graph of your data and you want to make additional measurements for that particular point and that the new data seems to fit better the apparent curve... How do you really know that your 2nd measurement was indeed better? Just by the aesthetic of the data when plotted? That's basically what the physicist did.
The little I know about statistics tells me that taking more measurements because the result of the original one doesn't match what I expected is bad and skew the result (
http://www.evanmiller.org/how-not-to-run-an-ab-test.html). I am not sure this is applyable in the case of a physics experiment hence my question here.
ZapperZ said:
Have you ASKED him why he used certain set of data and not others? That could have easily answered what you are asking here.
And no, as an experimentalist, I do not "thrash" out ANY data, even faulty ones, and even ones I do not use for some reason. That is a no-no. All data are kept and archived.
Zz.
Yes. And that happened in two very different cases:
1)There's a device that automatically register data every x seconds. You start the experiment and start the machine that makes the measurements. The system needs say 20 minutes to reach a temporary stability that last for say around 10 minutes. In the end you have data from t=0 to t=40 minutes. You plot the data and with your eyes you determine that the graph looks "linear" or "beautiful" or whatever criteria you assign and you decide to trash all the data but the one from t=22 min to t=29 min based on eyeballing the data.
Is that a correct behavior? Shouldn't he at least make a linear regression and set a threshold for the residuals, for all ranges of points? For example make a linear regression for the data from t=19min to t=28 min and caclulate the residuals. Do the same for t=20 to t=28 min, etc. and pick the one with least residuals? Of course these calculations would be automated by a program. Wouldn't that be much better than eyeballing the data?
2)The experiment is so noisy that measuring twice in a row under the same conditions give very different results. Hence some averages are made. In the end the plotted data is supposed not to have an enormous error bar, at least the ones that get published in serious journals. Now when a "point" on the graph looks way above or under the "trend of the expected curve", the physicist trashes the data in the sense that he'd rather not publish that point into a journal. Of course he will reproduce the experiment (and not trash the ugly looking data... it's just that it will never publish it) until he gets a point that fits well into his graph. He can repeat the experiment many times until that happens. So in the end he will publish the most beautiful data he measured in the laboratory but under the rock he hides a pack of discarted data. And if he never gets to measure a beautiful data for that ugly point, he will simply rather discard the point than introducing the ugly one.
Edit: It's in this way that I meant "trash", i.e. not publish that ugly looking data. Of course the data is not trashed from the computer or sheet of paper.