Is it common for physicists to trash data?

fluidistic · Mar 26, 2015

I'll soon start to take some data (i.e. making measurements) and from the little I've seen from a physicist doing research, I believe he trashes some data. In other words he doesn't publish all the data he extracts from the laboratory. He applies some non rigorous/mathematical criterion to the data to trash and keep only what seems reasonable to him, be it for his data to agree with previous research or because the data is so unexpected that it doesn't fit in any theoretical model and it may mean he has goofed the experiment.
I was wondering whether this behavior was common between physicists and if there are some non obvious cases where trashing some data is acceptable.

boneh3ad · Mar 26, 2015

It would be very rare for a researcher to publish all of their data. There are plenty of reasons to hold data back or trash it outright, some better than others. Holding it back simply because it doesn't agree with your preconceived ideas about what the results should be or invalidates a previous theory of yours is unethical. Holding it back because it is unexpected and you want to do additional tests to see if it was an error or correct is completely normal.

Drakkith · Mar 27, 2015

fluidistic said:

He applies some non rigorous/mathematical criterion to the data to trash and keep only what seems reasonable to him, be it for his data to agree with previous research or because the data is so unexpected that it doesn't fit in any theoretical model and it may mean he has goofed the experiment.

It's hard to say whether this is good or bad since we don't know the details. He may in fact have a very good reason for rejecting the data but be very bad at explaining the reasons to you. On the other hand, he could very well be trashing data without a good reason.

ZapperZ · Mar 27, 2015

fluidistic said:

I'll soon start to take some data (i.e. making measurements) and from the little I've seen from a physicist doing research, I believe he trashes some data. In other words he doesn't publish all the data he extracts from the laboratory. He applies some non rigorous/mathematical criterion to the data to trash and keep only what seems reasonable to him, be it for his data to agree with previous research or because the data is so unexpected that it doesn't fit in any theoretical model and it may mean he has goofed the experiment.
I was wondering whether this behavior was common between physicists and if there are some non obvious cases where trashing some data is acceptable.

Have you ASKED him why he used certain set of data and not others? That could have easily answered what you are asking here.

And no, as an experimentalist, I do not "thrash" out ANY data, even faulty ones, and even ones I do not use for some reason. That is a no-no. All data are kept and archived.

Zz.

fluidistic · Mar 27, 2015

Thank you guys. I apparently don't know how to multi quote automatically.

Holding it back because it is unexpected and you want to do additional tests to see if it was an error or correct is completely normal.

Say that you got an unexpected "point" on the graph of your data and you want to make additional measurements for that particular point and that the new data seems to fit better the apparent curve... How do you really know that your 2nd measurement was indeed better? Just by the aesthetic of the data when plotted? That's basically what the physicist did.
The little I know about statistics tells me that taking more measurements because the result of the original one doesn't match what I expected is bad and skew the result (http://www.evanmiller.org/how-not-to-run-an-ab-test.html). I am not sure this is applyable in the case of a physics experiment hence my question here.

ZapperZ said:

Have you ASKED him why he used certain set of data and not others? That could have easily answered what you are asking here.

And no, as an experimentalist, I do not "thrash" out ANY data, even faulty ones, and even ones I do not use for some reason. That is a no-no. All data are kept and archived.

Zz.

Yes. And that happened in two very different cases:
1)There's a device that automatically register data every x seconds. You start the experiment and start the machine that makes the measurements. The system needs say 20 minutes to reach a temporary stability that last for say around 10 minutes. In the end you have data from t=0 to t=40 minutes. You plot the data and with your eyes you determine that the graph looks "linear" or "beautiful" or whatever criteria you assign and you decide to trash all the data but the one from t=22 min to t=29 min based on eyeballing the data.
Is that a correct behavior? Shouldn't he at least make a linear regression and set a threshold for the residuals, for all ranges of points? For example make a linear regression for the data from t=19min to t=28 min and caclulate the residuals. Do the same for t=20 to t=28 min, etc. and pick the one with least residuals? Of course these calculations would be automated by a program. Wouldn't that be much better than eyeballing the data?

2)The experiment is so noisy that measuring twice in a row under the same conditions give very different results. Hence some averages are made. In the end the plotted data is supposed not to have an enormous error bar, at least the ones that get published in serious journals. Now when a "point" on the graph looks way above or under the "trend of the expected curve", the physicist trashes the data in the sense that he'd rather not publish that point into a journal. Of course he will reproduce the experiment (and not trash the ugly looking data... it's just that it will never publish it) until he gets a point that fits well into his graph. He can repeat the experiment many times until that happens. So in the end he will publish the most beautiful data he measured in the laboratory but under the rock he hides a pack of discarted data. And if he never gets to measure a beautiful data for that ugly point, he will simply rather discard the point than introducing the ugly one.

Edit: It's in this way that I meant "trash", i.e. not publish that ugly looking data. Of course the data is not trashed from the computer or sheet of paper.

ZapperZ · Mar 27, 2015

But did you ASK him why he's doing that?! You didn't answer that explicitly, and didn't post what the response from him was if you did.

I have a UV-VIS set up that gives very noisy signal outside of the optical range. It tells me nothing about the data outside of that range. When I publish the result, I don't show that part because it tells me nothing, and I also do not conclude anything outside of that range. It is just not relevant to what I'm doing.

That is just one reason why you don't publish ALL the data. That is why you have to ask this person why he ignored the data outside of that range. Do the data not change anything even if they are included? Are they not in the range of interest? There can be a number of reasons, but this is all speculative unless you directly ask! Asking it here doesn't solve anything.

BTW, don't ever use the term "trash" in this situation. It is gives a misleading impression that the data was destroyed and thrown away! You could get yourself (and those you work with) in a hot mess if you are careless in the language and the words that you use.

Zz.

fluidistic · Mar 27, 2015

ZapperZ said:

But did you ASK him why he's doing that?! You didn't answer that explicitly, and didn't post what the response from him was if you did.

I didn't ask him explicitely, he was explaining to me what he was doing. After a few hours of measuring he noticed an unexpected "point" in a graph and he said that he would not include it and that he would redo the experiment and if the point was still ugly he would simply remove that point from the graph, i.e. not publish that point. I concluded that the reason was because of an aestethical judgement.

I have a UV-VIS set up that gives very noisy signal outside of the optical range. It tells me nothing about the data outside of that range. When I publish the result, I don't show that part because it tells me nothing, and I also do not conclude anything outside of that range. It is just not relevant to what I'm doing.

That is just one reason why you don't publish ALL the data. That is why you have to ask this person why he ignored the data outside of that range. Do the data not change anything even if they are included? Are they not in the range of interest? There can be a number of reasons, but this is all speculative unless you directly ask! Asking it here doesn't solve anything.

In that case (case 1 in my post), he didn't include irrelevant data because the data was outside the range of interest, so it's similar to your case. Except that the range of interest is not "well defined" in the case of the physicist I've seen and he used eyeballing criteria of plotted data to set the range of interest. That influences the value of data of other plots. In your case you have an "optical range", may I ask whether your optical range is well defined and always the same regardless of your measurements? Or does it changes and if so, how do you determine the region of interest?

BTW, don't ever use the term "trash" in this situation. It is gives a misleading impression that the data was destroyed and thrown away! You could get yourself (and those you work with) in a hot mess if you are careless in the language and the words that you use.

Zz.

Sorry about that, you are right.

CWatters · Mar 27, 2015

Anomalies in data are sometimes where new discoveries are made. Ideally you should publish the lot and an explanation for any anomalies.

http://en.wikipedia.org/wiki/Discovery_of_cosmic_microwave_background_radiation

rootone · Mar 27, 2015

If there is an anomalous result. then either the theory behind the experiment is wrong or there is something wrong in the way the experiment was carried out.
It's generally best to check the experimental setup first, and if it becomes obvious that something is wrong with that, then attempt to fix the problem.
A good recent example is this
http://en.wikipedia.org/wiki/Faster-than-light_neutrino_anomaly

There isn't any good reason to destroy the data though, instead note simply that the experiment was found to be invalid for whatever reason.

Is it common for physicists to trash data?

Similar threads

Hot Threads

B How much rubidium-88 is there in nature?

I 'Set of pearls' mathematics / physics help

B Is space stretching or is new space being created?

B MIRV vs very high-yield bomb effectiveness

I How can magnetic fields contain energy?

Recent Insights

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers

Insights Fermat's Last Theorem

Insights Why Vector Spaces Explain The World: A Historical Perspective