Is Science Vulnerable to Software Bugs?

anorlunda · Jul 15, 2016

I'm reposting this here because it raises a fascinating kind of vulnerability (new to me). It is a vulnerabililty of any science using software for analysis to a common-mode error. The thought that 40,000 scientific teams were fooled is shocking. It is a good post, it links sources both supporting and opposing the conclusions.

http://catless.ncl.ac.uk/Risks/29.60.html said:

Faulty image analysis software may invalidate 40,000 fMRI studies
Bruce Horrocks <bruce@scorecrow.com>Thu, 7 Jul 2016 21:14:15 +0100
[Please read this to the end. PGN]

A new paper [1] suggests that as many as 40,000 scientific studies that used
Functional Magnetic Resonance Imaging (fMRI) to analyse human brain activity
may be invalid because of a software fault common to all three of the most
popular image analysis packages.

... From the paper's significance statement:

"Functional MRI (fMRI) is 25 years old, yet surprisingly its most common
statistical methods have not been validated using real data. Here, we used
resting-state fMRI data from 499 healthy controls to conduct 3 million task
group analyses. Using this null data with different experimental designs, we
estimate the incidence of significant results. In theory, we should find 5%
false positives (for a significance threshold of 5%), but instead we found
that the most common software packages for fMRI analysis (SPM, FSL, AFNI)
can result in false-positive rates of up to 70%. These results question the
validity of some 40,000 fMRI studies and may have a large impact on the
interpretation of neuroimaging results."

Two of the software related risks:

a) It is common to assume that software that is widely used must be
reliable, yet 40,000 teams did not spot these flaws[2]. The authors
identified a bug in one package that had been present for 15 years.

b) Quoting from the paper: "It is not feasible to redo 40,000 fMRI studies,
and lamentable archiving and data-sharing practices mean most could not
be reanalyzed either."

[1] "Cluster failure: Why fMRI inferences for spatial extent have inflated
false-positive rates" by Anders Eklund, Thomas E. Nichols and Hans
Knufsson. <http://www.pnas.org/content/early/2016/06/27/1602413113.full>

[2] That's so many you begin to wonder if this paper might itself be wrong?
Expect to see a retraction in a future RISKS. ;-)

[Also noted by Lauren Weinstein in *The Register*:]
http://www.theregister.co.uk/2016/07/03/mri_software_bugs_could_upend_years_of_research/

[And then there is this counter-argument, noted by Mark Thorson:
http://blogs.discovermagazine.com/neuroskeptic/2016/07/07/false-positive-fmri-mainstream/

The author (Neuroskeptic) notes that Eklund et al. have discovered a
different kind of bug in AFNI, but does not apply to FSL and SPM, and does
not "invalidate 15 years of brain research." PGN]

I would think that this issue supports mandating that the raw data of all scientific studies should be open sourced and archived publicly. That way, the data could be re-processed in the future when improved (or corrected) tools become available, and published conclusions could be automatically updated or automatically deprecated.

jedishrfu · Jul 15, 2016

Yes, this is what is done at some professional labs, especially where new models of analysis are being developed and you have the need to compare the existing model with the newer faster/better one.

However, bugs such as this are a very difficult problem to uncover with the Intel Pentium bug as a notable example. A bug can be in the sensor electronics used to measure something, or in the processor hardware, or faulty memory/storage, or firmware or driver, or library software or the application itself. At each level, testing is done with varying levels of coverage with the final application being the least tested.

It also brings back the notion that everything that we do is essentially a house of cards and I agree we need to prepare for the evitable with backups of key data or risk having to rerun an experiment.

More on the Pentium bug:

https://en.wikipedia.org/wiki/Pentium_FDIV_bug

https://www.cs.earlham.edu/~dusko/cs63/fdiv.html

anorlunda · Jul 15, 2016

It is easy to visualize (but perhaps hard to implement) a system where all papers published digitally contain links pointing to their dependencies. Citations of prior work, and links to hardware and software dependencies for example. Thereafter, there would be two ways to go.

Using bidirectional links: When the object depended on changes, reverse links can notify the authors of all dependent papers.
Using unidirectional links: When a work is called up for viewing, the work's dependency links can be checked. If they are found to point to a retracted or revised or deleted object, then the viewer of the dependent work can be warned. Links can descend recursively to the bottom. The viewer gold standard would be to refuse to read or cite any paper with less than flawless dependencies. If that proves too bothersome, viewers could use a bronze or a lead standard along a continuum of choices.

Is Science Vulnerable to Software Bugs?

Similar threads

France to ditch Windows for Linux

Is This Music AI?

Gmail AI summaries

Help me build my server with a laptop that has a broken screen

AI Used In Peer Review

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect