Is Science Vulnerable to Software Bugs?

  • Thread starter Thread starter anorlunda
  • Start date Start date
  • Tags Tags
    Science
AI Thread Summary
A recent discussion highlights a critical vulnerability in the use of software for analyzing Functional Magnetic Resonance Imaging (fMRI) data, suggesting that as many as 40,000 studies may be invalid due to a common-mode error in popular analysis packages like SPM, FSL, and AFNI. The paper referenced indicates that these software tools have not been adequately validated with real data, leading to false-positive rates as high as 70%, far exceeding the expected 5%. This raises serious concerns about the reliability of neuroimaging research, as many studies cannot be easily reanalyzed due to poor data-sharing practices. The conversation also emphasizes the need for open-source data archiving to allow for future reprocessing with improved tools, and it draws parallels to historical software bugs, illustrating the complexities of ensuring data integrity in scientific research. The discussion suggests that a robust system of dependency links for published papers could enhance accountability and transparency in scientific findings.
anorlunda
Staff Emeritus
Science Advisor
Homework Helper
Insights Author
Messages
11,326
Reaction score
8,750
I'm reposting this here because it raises a fascinating kind of vulnerability (new to me). It is a vulnerabililty of any science using software for analysis to a common-mode error. The thought that 40,000 scientific teams were fooled is shocking. It is a good post, it links sources both supporting and opposing the conclusions.

http://catless.ncl.ac.uk/Risks/29.60.html said:
Faulty image analysis software may invalidate 40,000 fMRI studies
Bruce Horrocks <bruce@scorecrow.com>Thu, 7 Jul 2016 21:14:15 +0100
[Please read this to the end. PGN]

A new paper [1] suggests that as many as 40,000 scientific studies that used
Functional Magnetic Resonance Imaging (fMRI) to analyse human brain activity
may be invalid because of a software fault common to all three of the most
popular image analysis packages.

... From the paper's significance statement:

"Functional MRI (fMRI) is 25 years old, yet surprisingly its most common
statistical methods have not been validated using real data. Here, we used
resting-state fMRI data from 499 healthy controls to conduct 3 million task
group analyses. Using this null data with different experimental designs, we
estimate the incidence of significant results. In theory, we should find 5%
false positives (for a significance threshold of 5%), but instead we found
that the most common software packages for fMRI analysis (SPM, FSL, AFNI)
can result in false-positive rates of up to 70%. These results question the
validity of some 40,000 fMRI studies and may have a large impact on the
interpretation of neuroimaging results."

Two of the software related risks:

a) It is common to assume that software that is widely used must be
reliable, yet 40,000 teams did not spot these flaws[2]. The authors
identified a bug in one package that had been present for 15 years.

b) Quoting from the paper: "It is not feasible to redo 40,000 fMRI studies,
and lamentable archiving and data-sharing practices mean most could not
be reanalyzed either."

[1] "Cluster failure: Why fMRI inferences for spatial extent have inflated
false-positive rates" by Anders Eklund, Thomas E. Nichols and Hans
Knufsson. <http://www.pnas.org/content/early/2016/06/27/1602413113.full>

[2] That's so many you begin to wonder if this paper might itself be wrong?
Expect to see a retraction in a future RISKS. ;-)

[Also noted by Lauren Weinstein in *The Register*:]
http://www.theregister.co.uk/2016/07/03/mri_software_bugs_could_upend_years_of_research/

[And then there is this counter-argument, noted by Mark Thorson:
http://blogs.discovermagazine.com/neuroskeptic/2016/07/07/false-positive-fmri-mainstream/

The author (Neuroskeptic) notes that Eklund et al. have discovered a
different kind of bug in AFNI, but does not apply to FSL and SPM, and does
not "invalidate 15 years of brain research." PGN]

I would think that this issue supports mandating that the raw data of all scientific studies should be open sourced and archived publicly. That way, the data could be re-processed in the future when improved (or corrected) tools become available, and published conclusions could be automatically updated or automatically deprecated.
 
Computer science news on Phys.org
Yes, this is what is done at some professional labs, especially where new models of analysis are being developed and you have the need to compare the existing model with the newer faster/better one.

However, bugs such as this are a very difficult problem to uncover with the Intel Pentium bug as a notable example. A bug can be in the sensor electronics used to measure something, or in the processor hardware, or faulty memory/storage, or firmware or driver, or library software or the application itself. At each level, testing is done with varying levels of coverage with the final application being the least tested.

It also brings back the notion that everything that we do is essentially a house of cards and I agree we need to prepare for the evitable with backups of key data or risk having to rerun an experiment.

More on the Pentium bug:

https://en.wikipedia.org/wiki/Pentium_FDIV_bug

https://www.cs.earlham.edu/~dusko/cs63/fdiv.html
 
Last edited by a moderator:
It is easy to visualize (but perhaps hard to implement) a system where all papers published digitally contain links pointing to their dependencies. Citations of prior work, and links to hardware and software dependencies for example. Thereafter, there would be two ways to go.
  1. Using bidirectional links: When the object depended on changes, reverse links can notify the authors of all dependent papers.
  2. Using unidirectional links: When a work is called up for viewing, the work's dependency links can be checked. If they are found to point to a retracted or revised or deleted object, then the viewer of the dependent work can be warned. Links can descend recursively to the bottom. The viewer gold standard would be to refuse to read or cite any paper with less than flawless dependencies. If that proves too bothersome, viewers could use a bronze or a lead standard along a continuum of choices.
 
In my discussions elsewhere, I've noticed a lot of disagreement regarding AI. A question that comes up is, "Is AI hype?" Unfortunately, when this question is asked, the one asking, as far as I can tell, may mean one of three things which can lead to lots of confusion. I'll list them out now for clarity. 1. Can AI do everything a human can do and how close are we to that? 2. Are corporations and governments using the promise of AI to gain more power for themselves? 3. Are AI and transhumans...
Thread 'ChatGPT Examples, Good and Bad'
I've been experimenting with ChatGPT. Some results are good, some very very bad. I think examples can help expose the properties of this AI. Maybe you can post some of your favorite examples and tell us what they reveal about the properties of this AI. (I had problems with copy/paste of text and formatting, so I'm posting my examples as screen shots. That is a promising start. :smile: But then I provided values V=1, R1=1, R2=2, R3=3 and asked for the value of I. At first, it said...
Back
Top