Shouldn't computer math black boxes be shunned?

Nick Levinson · Jul 26, 2016

Is it always safe to rely on math methods where the scientist is unable to see exactly what steps constitute a given method in the implementation the scientist is using? The increasing complexity and specialization of math branches, where even mathematicians do not always understand each others' work, the fact that a computer does not always do math the way a person was taught to do when s/he is not using a computer, and reliance on computers and proprietary closed-source math programs in which it is tough and unlawful to examine the programming suggest that an important part of refereed work must be inadequately checked.

Mathematicians would know how to check but by now there'd often be too much to be able to devote the time to doing it. The scientist who can do that probably has other research to do, to which we look forward. So, basically, there's no one.

The pure math is not my concern, but computer-ready math is often different because of computer limitations. For example, inevitably a formula must have a length limit in a computer but need not outside of a computer; so, if the length limit limits a particular formula, it must be replaced with multiple formulae which must then be combined.

I gather that some of the most-used computer programs for these purposes rely on closed source code for some math, and thus are not completely transparent. Closed-source programs use black boxes. You can see your input and get the output, but exactly how input is transformed into output is hidden. You can check that individual functions with specific example inputs produce correct outputs, but I'm not sure you can test all of the functions using the methods that are required for proofs, i.e., methods in which examples are not probative enough but abstraction is required, or that you can test holistically and not just reductively, important if an error still hidden despite the examples tested gets compounded with another error as multiple black boxes are applied to one problem. The interaction of a math program with the rest of its computing environment is such that an error might be introduced by the computing environment and have to be discovered, a fresh risk whenever hardware or software has a new version, and there are usually many associated software and hardware components that have separate versions and probably separate authorships, and usually some have bugs. Even if a high-end math program completely produces all of its math processing without handing any off to the operating system, thereby eliminating one set of inspection problems, other interactions are left.

Doubtless the top computer-program firms have highly qualified mathematicians test and correct their work, but doubtless also that's limited by trade-secrecy and budget, a model that falls far short of the peer review models used for publication of original research in refereed journals and by the effect of publication after peer review, when anyone can read the journals and report a problem they find, even if the reporter lacks qualifications and is unpaid. With proprietary closed-source software and especially firmware, even a customer who paid for it is usually unable to examine it, because they usually don't know how to parse the code (especially code wired into a hardware chip) and perhaps (like with Windows) are legally barred from reverse-engineering, decompiling, or disassembling. Some software licenses even prohibit benchmarking, although I don't know if that applies to the software used in this field.

With open source software (such as Linux or FreeBSD), the source code is available to and thoroughly examinable by anyone and can be compiled or interpreted with your own compiler or interpreter on your own computer into the object code that is an executable program, so you know that the source code you examined as carefully as you like is the source code for the program you use for your scientific investigations. Even the recent public debate over privacy due to revelations about the work of the National Security Agency (NSA) did not lead to much discussion that I could find on the security of SELinux, an NSA security enhancement package offered for Linux for anyone who wants to turn it on. Because SELinux is offered in compliance with an open-source model, confidence is apparently maintained, even though SELinux alone reportedly uses over 100,000 lines of code. Writing good open-source software for this kind of math is a huge project, and were I allocating resources I would skimp on other features, such as by writing it for only one common desktop platform and leaving most user-interface design to add-ons by other people, such as by writing open-source software under a license that allows it to be included in closed-source software that can have all the support and non-math features users may want (e.g., multiplatform compatibility, a good user interface, and many input/output interfaces) but so the math components can still be checked character-by-character by anyone.

I'm not asking merely if physicists are careful (I'm sure they are), if their computers are good (ditto), if the scientists know their math (they doubtless do), if critical computer bugs are reported and patched (I'll assume they tend to be albeit perhaps late), or if journal editors are careful (doubtless they are). I'm not objecting to firms making profits and I'm not pushing a principle that all software should be free (instead, I'm arguing for quality). I'm not arguing either way on whether scientists should modify their software (although open-source licenses generally allow it). I'm not arguing about budget allocations. I'm not trying to be provocative; this grew out of my prior thread in which I stated what I thought was already accepted but am told is not, so I'm not repeating it, but the concern is nontrivial.

The key question: Could there be one or more black boxes in computer math that have not been fully proven with all relevant versions and computing environments in public and, if so, is that fundamentally safe for the more momentous work in physics, especially for research questions on which opportunities for cross-checking are more limited?

CalcNerd · Jul 26, 2016

Perhaps you should look into the Turing principle. Often math software is tested exhaustively, but even so, a problem can crop up in an unusual way. As I am somewhat of a calculator connoisseur, my experience is with calculators and CAS. Two companies, Hp and Ti make high end calculators that often do not produce the exact same answer when in their CAS mode. Both will often reduce their solution to the same numeric solution, but their actual algebraic answers are often different. This is simply the case of one calculator using one method to solve a solution vs using another equally valid solution method. There are often several ways to arrive at the same answer. How your black box does so is often proprietary and not open for you to view or understand. If you don't like that, you must then develop your own software or acquire open source software that let's you peek into the box, so to speak.
.
Not sure what you can do otherwise.

Dale · Jul 26, 2016

Nick Levinson said:

instead, I'm arguing for quality

In the end, I think this is the key. What we want is for the software to be correct so that the scientific results can be trusted. The question is then if open source or commercial software has higher quality/trustworthiness. I am torn on that. Commercial software is driven to quality by competition. Open source software is driven to quality by transparency. Both are actually pretty good drivers.

By the way, this isn't unique to software. Commercial hardware is also often a black box. It can still be higher quality than anything that could be custom built, even with full transparency.

Nick Levinson · Jul 26, 2016

@CalcNerd:

"Two . . . [competing] high end calculators . . . . often reduce their solution to the same numeric solution": "often"? Given the brands and their being high-end, depending on use, that the solutions are only "often" the same is somewhat alarming. If, in all of the important cases, the numeric solutions are the same, how they get there is not critical, but if they're offering algebraic answers that differ, and those are answers you're invited to use outside of your calculators, I hope both answers are (somehow) right.

"If you don't like that [it's "often proprietary and not open"], you must then develop your own software . . . .": If the problem you describe with calculators also applies to the highest-end math used in physics investigations, that suggests a need for more investment in open source math modules with licenses that permit proprietary systems to use them with transparency for those parts of the otherwise-proprietary code. I wouldn't mind if the proprietary packages forbade code modification to any parts of their programs (otherwise a modification could break the rest of the proprietary package) as long as the open source modules could be modified outside of the proprietary packages. The proprietors could then opt to accept the modified modules or not.

@Dale:

Agreed on the goal.

Competition has entered the open source field, too. I use several Linux distros and, briefly, tried a *BSD-with-GUI OS and, over the years, several major apps of the same kind. Their websites and feature differences show signs of competition (Ubuntu is friendlier but less secure, etc.). I've begun applying as a sometime criterion that an OS has a large user base, as that may (or may not) correlate with a large developer base and the latter, perhaps, with a faster patching/updating frequency, helpful for security (although I'm beginning to think that maybe FreeBSD has a relatively small dev base while widely trusted by Web hosts, many being attacked, so perhaps many hosts fork and patch without reporting upstream).

But testing the math is a problem. Testing it in a spreadsheet or a calculator is easier; you find one or two trustworthy high-end systems and then compare, especially with borderline cases. And you can test high-end math systems for what they do with, say, the square root of two by using a kind of supermajority test: test lots of systems, including paper, and see if any diverge; if any does, try to figure out why and report either one bug or many (maybe the exception was the only right implementation). But if you want to test a high-end math procedure that only a couple of high-end computer systems offer and they're both closed source, and I expect that could occur shortly after a new math procedure has been published in a peer-reviewed journal when software producers try to offer it as soon as they can get it out the door (I'll assume they're not rash and won't release it until a well-qualified mathematician agrees it's okay), I wonder if that's when an open source module is most needed. Or, better, open source modules should be written as soon as a new procedure is published, so the commercial vendors can focus on vetting it and on improving the non-math features. Is there a trend in that direction?

Hardware is definitely a potential problem, especially since patching it can be a costly headache. My understanding is that the hardware does only simple arithmetic (mainly in the registers) but, because it does it very fast, it gets quite a lot of it to do after preprocessing by the operating system, which gets it after preprocessing by an app, and that some high-end math software avoids even that use of hardware. If so, that's a solution for the hardware. I haven't looked up much about supercomputers but saw that a Cray looked like a batch of off-the-shelf regular computers hooked together with hardware to support parallel processing and other systems that combine to make a supercomputer. That would suggest that supercomputer hardware also does not do math except low-end arithmetic, but I haven't verified that.

Dale · Jul 26, 2016

Nick Levinson said:

Competition has entered the open source field, too. I use several Linux distros and, briefly, tried a *BSD-with-GUI OS and, over the years, several major apps of the same kind. Their websites and feature differences show signs of competition

I don't think that sort of competition has reached the algorithm level yet. (Not that competition is a cure-all)

R has a quite large and active community developing new algorithms. Somehow it has become the de facto standard of the statistics community. Even so, there is relatively little competition at the algorithm level, with rarely more than two packages offering the same functionality. (One exception is Bayesian methods)

The R development community values novel features and methods over higher quality code. Part of this is driven by the prevalence of packages with peer reviewed papers. Academic publications are still more driven by novelty than quality.

f95toli · Aug 3, 2016

I don't think this is a very serious problem for the more popular packages. Anyone doing a lot of numerical calculations using e.g. Matlab or Python gets into the habit of checking their programs using known input/outputs simply because you also have to check your own code. The likelihood of a problem occurring because of a bug in the software is usually very small, especially compared to the probability of getting an error because the user made a mistake somewhere. That said, problems DO of course occur, but they are probably more likely in more "niche" software (such are the recently reported problems with some fMRI software) where the community is much, much smaller.
Note also that many commercial packages do use free software "under the hood", Matlab uses e.g. Lapack and fftw.

Problems are -as was mentioned above- much more likely to occur because of problems with measurement hardware, especially with modern hardware which does a lot of processing internally before sending the data to the user. That said, I consider it to be part of my job to find such issues and we always do our best to run tests. Also, errors are -once again- much more likely to pop up due to user error (or "trivial" problems such a bad cables) than to actual "bugs" in the hardware.

The bottom line is that you can't trust anything 100%, the only thing you can do is to check and double- check everything.

Nick Levinson · Aug 6, 2016

Problems from causes other than closed-source software math errors (such as user error, hardware error, and network failure) are not disputed or undervalued; I agree they matter, but I'm focusing on this one type of cause. That you can do some checking of the software by checking inputs and outputs is acknowledged above but that's not as reliable as checking the entire logic of the software's math. That some proprietary software uses free software, assuming what's meant is specifically software that is free for a user to examine and modify at the source-code level (which is true for Lapack but not for all versions and licensings of fftw), is what I want to encourage, but I understand that many proprietary software packages do not use open-source math routines for at least some of their routines, and that's where I think there would be a problem. If many routines are available only in less-closely-supported software packages in which errors lurk longer (according to Dale, supra, "rarely more than two packages offer . . . the same functionality ([o]ne exception . . . [being] Bayesian methods)"), if that means that the errors that remain longer are math errors, that should also be a concern, although that's perhaps equally a problem for closed- and open-source software, an issue separate from what I brought up but also concerning. That "the only thing you can do is to check and double- check [sic] everything" is impossible if you're using closed-source software which most users physically lack the means to check and by law you are not allowed to check.

Dale · Aug 6, 2016

Nick Levinson said:

That you can do some checking of the software by checking inputs and outputs is acknowledged above but that's not as reliable as checking the entire logic of the software's math

Hmm, I am not at all convinced that is true. If code inspection were reliable then there wouldn't be so many bugs in software. Code is always inspected by the author, and usually by others as well.

Automated black box tests are more reliable than inspection, IMO. That is not what open source code brings. What it brings is a larger number of developers to fix bugs, and a pool of users who can trace through and find the source of an error themselves without having to rely on tech support.

Nick Levinson · Aug 6, 2016

I meant only that code inspection is more reliable than input-output comparison, but code inspection is not perfect, since it has to be done at least partly by a human, potentially fallible. Even better would be both code inspection and input-output comparison. I'm not clear how any test could be better with less knowledge than with more except for cases of human intervention (the reason for double-blind testing in medicine) or cases of confusion from excess knowledge (a human error, perhaps in program-writing), so I assume that automated black-box tests are helpful but combining them with logic inspection (turning a black box into a white box) would be better, albeit more expensive and some might wonder if that's the best use of resources, a good question for which I have no specific answer.

In general, while someone may well inspect all of the code for all of the possible faults, even better is also a specialist inspecting the code for the potential faults within the specialist's expertise. Doubtless the better closed-source programs have mathematicians doing that and maybe those firms can afford it so they do it more than with open-source, but open source being open to anyone who wants to look means a math professor can ask their class to look at it as part of a class exercise, a paper, or an exam in addition to being looked at by anyone hired by the software firm and subject to a confidentiality agreement. A proprietary firm using open source can charge for the total software package and use some of the revenue for any checking that can be applied to wholly closed source, so that the benefits of both source models (checks and developer communities) can be combined. Open source for many (not all) programs includes public information about alleged nonsecurity bugs reported, whereas proprietary firms prefer to keep the public's reports about all kinds of bugs hidden from the public. The public reports on open source can be read by a concerned someone whether a patch has been developed or not, although the list of alleged bugs may be huge and often many are wrong, old, unclear, etc.

Dale · Aug 6, 2016

Nick Levinson said:

Even better would be both code inspection and input-output comparison.

I agree. That is how all commercial software is developed and most open source software. So I don't think that is a distinguishing feature between commercial and open source.

Nick Levinson said:

I'm not clear how any test could be better with less knowledge than with more

I don't have any non-anectdotal evidence, but my personal experience is that code review is a relatively ineffective way to test for errors.

Nick Levinson said:

In general, while someone may well inspect all of the code for all of the possible faults, even better is also a specialist inspecting the code for the potential faults within the specialist's expertise.

I am not convinced that this is true either. Specialists often write terrible code. In particular, specialists often focus on how code should work to the exclusion of how it could fail. That is the difference between high quality and low quality code.

It seems like you believe that human inspection of code is much more effective at improving code quality than I believe it is.

Nick Levinson · Aug 8, 2016

An investigator deep in a project sometimes gets curious about a tool or an anomaly and wants to understand the innards; closed source instead requires trust in the proprietor, albeit usually well-placed trust. I just suspect that letting people do this will uncover some previously-unnoticed problems. Sometime after Windows 98SE was mostly unsupported by Microsoft, I as a user found what was likely a new fault with its password system (in short, a deleted password sometimes worked when the new one stopped working and I figured out why), wrote to Microsoft, got no answer, and posted the details to a public newsgroup; the response was basically that I should buy something newer, ignoring that millions of Win98SE copies were still in use including in schools and that the architecture might still be in newer Win OSes. I never worked for Microsoft and basically have no way to see if anyone else reported that bug or what MS did about it for later OSes or if architectural changes made them irrelevant. Granted most biologists don't detach lenses from excellent microscopes to check color fidelity and most mathematicians probably don't know the C language, those who do or who have assistants who do should have the chance.

I agree that code review must often fall short, mainly because there's so much code and so many possibilities to consider. I understand the final solver of Fermat's Last Theorem proposed a solution after 7 years but then someone found an error; he fixed that in an additional year; I think the error-finder was not in the solver's inner circle. Outsiders need access.

Testing to cause a common failure type to ensure it wouldn't be catastrophic is smart and you're right that many people don't do that. My point about specialists as reviewers is that they would be additional, not reviewing instead of generalists.

They could write bad code because they forget failure risks, but it's much more common to report an error than to write a patch. If they cite their software in a paper, I doubt an editor would accept patching without saying so in the paper, unless the patch first led to a new software version which got cited in the paper.

An overnight switch of source type could be bad. For some math, closed source may still be more accurate today, until open source is more developed.

Svein · Aug 10, 2016

Nick Levinson said:

I agree that code review must often fall short, mainly because there's so much code and so many possibilities to consider. I understand the final solver of Fermat's Last Theorem proposed a solution after 7 years but then someone found an error; he fixed that in an additional year; I think the error-finder was not in the solver's inner circle. Outsiders need access.

Er... He found the "error" himself. I am not quite sure that another mathematician would have discovered the weak link in the (very long) proof.

Dale · Aug 10, 2016

Nick Levinson said:

An investigator deep in a project sometimes gets curious about a tool or an anomaly and wants to understand the innards

Sure, but isn't the goal "quality" and not "satisfy curiosity". I am not convinced that merely satisfying investigator's curiosity drives an improvement in quality.

It may increase the number of developers touching a piece of code, but it dramatically reduces the economic incentives for any individual developer, and it relaxes all good manufacturing practice requirements, both effects reducing the quality of the work from the developers individually and collectively. Does the increase in the number of developers compensate for the decrease in quality and control? I don't think the answer is obvious either way.

Nick Levinson · Aug 13, 2016

@Svein: "He [Andrew Wiles] found the 'error' himself." Maybe he also found it, but Nature said, "[h]e went on to make a historic announcement at a conference in his hometown of Cambridge, UK, in June 1993, only to hear from a colleague two months later that his proof contained a serious mistake. But after another frantic year of work . . ." (http://www.nature.com/news/fermat-s-last-theorem-earns-andrew-wiles-the-abel-prize-1.19552 as accessed Aug. 11, 2016). I think the N.Y. Times said something similar, years ago. "I am not quite sure that another mathematician would have discovered the weak link in the (very long) proof." We don't need a guarantee that someone else would have found it, just a chance to let them. If what you mean is that no one would have, especially because the proof was very long, we don't know that no one would have. Sometimes, hard questions are answered by people we don't expect would. When they're right, that's good for them and good for us.

@Dale:

Quality and satisfying curiosity: Yes, but we don't care why someone sets about improving quality, as long as they do. If someone solves a topology problem that makes it through peer review and is widely cited but in their memoirs they say it was because they ate dry Cheerios wrapped in rancid ham while watching a Disney movie backwards, and no one contradicts the proof, we probably don't withdraw the article. Chances are curiosity will be whetted and satisfied in, oh, maybe a dozen or more cases before one person produces a useful improvement. We regret not everyone contributes an improvement but we're glad for those who do. I understand that in high-quality research and development a normal ratio is four or five bad inventions, even after the inventors selected only the ones they thought were the best for managers above R&D to consider, for one good invention, good enough to pay for the costs of the bad four or five. The one paying for the others is part of why we accept the ratio (the other part being that we likely can't get a better ratio). R&D is usually staffed by bright people and should be, but letting others toss ideas into the pot can be productive.

I had a contact with Gimp's organization; it appeared one developer was the gatekeeper for what features got added, and that person said something to the general effect that even something being a good idea is not ground enough. That's fine, because Gimp is good and I think there are several other open-source competitors (I still use Gimp as an amateur) and at least one closed-source competitor (Photoshop). Various development models are all supportable, depending on the details and the resulting product quality. Probably, even identical models implemented by different individuals would yield different overall product quality. All of that applies to either kind of source.

I'm not sure adding dev inputs is likely to lower quality in the end, although, before vetting, the average input would be of lower quality. Along the way, what's likely is that loading up on bug reports and suggestions without an increase in developers and managers may result in proportionately more inputs being ignored at the first triage, and maybe often the wrong ones. But people looking to break into programming or related software areas and willing to do free work in return for getting verifiable credit, often visible for years, is probably driving a good deal of the development already underway; they don't even have to sit through an interview. Some development likely comes from paid developers; I think IBM has contributed to Linux and the NSA has; likely that applies also to Google, Red Hat, and Micro Focus (Novell/Suse). I suspect, but haven't checked, that, for open source, an increase in a user base tends to lead to an increase in the number of developers and that in turn tends to lead to increases in numbers of quality assurance testers, packagers, and so on. I, as a suggester or bug reporter, was invited to become a quality tester for one package, although I declined; an increase in any kind of participants makes recruiting for other kinds easier. And, at any rate, if an increase in outsiders' inputs risks lowering product quality, that's also true of closed source, and we'd hardly be better off by discouraging inputs from people finding apparent bugs by making reporting harder for closed source. Even if proprietary program firms have customer service offices that are fabulous at welcoming bug reports, they tend to keep bug lists invisible even to customers, which makes analysis and comparison nearly impossible for most outsiders to do.

If the additional overhead costs of letting just anyone contribute thoughts to an open source project were too high relative to return, most of the projects would have dropped the public participation. They could still be open source. But I don't know of any that ended inputs from people without known qualifications.

If the argument is that program development should be led by bright trusted people, yes. But that's for leadership and that needn't exclude the others, even though exclusion would boost efficiency, because that boost likely would eventually be at the cost of quality, especially if some of the excluded people, if wanting to weigh in on open-source advanced math/statistics programs, were advanced mathematicians/statisticians themselves, just not employed or contracted by a computer firm.

Dale · Aug 13, 2016

Nick Levinson said:

Quality and satisfying curiosity: Yes, but we don't care why someone sets about improving quality, as long as they do.

First, I think this is shortsighted. If you are relying on people to do an important task (like improving software quality), then you should understand their motivation. For commercial software the motivation is clear and strong, which allows the company to demand adherence to good manufacturing practices. For open source software the motivation is much weaker, so good manufacturing practices and process control are less clear.

Second, even if we ignore the motivation, I still haven't seen any solid evidence supporting the claim that they actually do set about improving quality. More developers is not the same as more quality. It might be an advantage, but it might cause more errors or simply drive more features rather than higher quality.

Nick Levinson said:

If the argument is that

Let me be clear, I am not making an argument one way or the other. I think very highly of both commercial and open source software. I see both as different ways of producing good scientific software.

I am simply objecting to your argument that open source software is inherently higher quality than commercial software. The arguments you have proposed consistently make unsubstantiated connections and claims. I don't know of any scientific evidence to support your assertions, and my personal anecdotal experience is rather contradictory.

Code review is not a strong guarantor of quality. Satisfying curiosity does not drive quality. A large number of developers might drive quality, but even that is not certain and I am not aware of scientific evidence comparing that driver of quality versus commercial drivers of quality.

Nick Levinson · Aug 15, 2016

Exposing the motivations tends to be popular among managers, and can be helpful, but that reliance and the control based on it can also backfire, such as when a deadline is considered more important than getting the best quality, as with Windows (which, while superior at user-friendliness, is often buggy), because people relying on continuing paychecks are less likely to challenge what the management already accepted as good enough. I just patched an operating system when two managers believed it was contrary to what most users needed; another firm mainly agreed with me but I was told by this firm that if I want the change I'd have to do it myself; I volunteered over 12 hours on it; they accepted my patch but still didn't see the point; were I paid by them, they'd have assigned me elsewhere instead; their claim of the patch being useless is at least debatable (I'd say refuted but they wouldn't concede that). All they know about my motivations is what they care to glean from the bug thread.

I'm not arguing that open source inherently produces higher quality than closed source, but that it can. I think the general acceptance of open-source statistics programs to the point that many functions are available only in a couple of programs (regardless of source type) shows that the quality of the statistics is probably equivalent in general, or the closed-source proprietors would have offered most of the same functions as open-source programs do, and, by offering higher math quality, would have swallowed most of the market. I gather that hasn't happened, so advanced users probably generally implicitly accept both source models, probably with exceptions.

Some major open-source programs come two ways, one using a proprietary management system and the other more open to public participation. Red Hat does that with RHEL and indirectly CentOS on the one hand and Fedora on the other; Micro Focus does that with Suse and openSuse; Google, with the Chrome browser and Chromium. My $10 cell phone's browser, probably made under a proprietary management model, used free components from open-source *BSD distributions, but the resulting browser is so awful I don't try to use it even when I badly need one. I argue for more development of open-source high-quality components that proprietarily-managed packages can incorporate.

"More developers is not the same as more quality. It might be an advantage, but it might cause more errors or simply drive more features rather than higher quality." (@Dale, supra.) Agreed. Managing that result is up to the people in charge of herding the cats' output into a cohesive whole. Often, volunteers can't or don't offer the same number of hours per week as paid staff can on a per-person basis, but management systems can compensate for that, by increasing recruitment, easing post-recruitment participation methods, and increasing quality control. These steps increase costs but some of the participants being volunteers or being paid by other firms can offset the cost increases for the management firms.

Feature drive, sometimes bloat, is probably independent of whether source is open or closed and whether developers are many or few. It has to be balanced with a quality drive and, where quality can be brought to a plateau, like for accuracy, then, at least in critical software, that plateau should be climbed and held before features are added. We likely agree on that.

I think we're actually closer to generally agreeing on parts than it seems in a single post. Probably we disagree mainly on whether doing what I suggest is important enough at this time, compared to other areas for effort or research of various kinds, and for that I don't have a strong case other than to raise the question about black boxes being insufficiently tested, which, I think, was answered among us. My interpretation is that the testing is inadequate in closed source but critical errors generally don't seem to be occurring.

Dale · Aug 15, 2016

Nick Levinson said:

I'm not arguing that open source inherently produces higher quality than closed source, but that it can

I have no objection to that claim, which is far more modest than "shunning" commercial software.

Nick Levinson said:

Probably we disagree mainly on whether doing what I suggest is important enough at this time,

No. Where we mainly disagree is with the validity your justifications. Even though your stated argument above is quite modest and reasonable, your justifications and reasoning come off as rather extreme.

Nick Levinson · Aug 17, 2016

The shunning I asked about was of the black boxes themselves, not of a commercial software package as a whole, and not before a white-box alternative was available, for which open source is a route, but is a viable route only if it produces a white box in which the math is of equal or higher quality than in the matching black box.

If an open-source program and a closed-source program were equally good including being equally accurate in the math, we should prefer open source and therefore, by process of elimination, we should shun closed-source. But, I gather, we probably don't have two programs that are equally good for a given math function, one closed and one open. Without that, preferring and shunning on that ground are premature.

Perhaps unaddressed is another criticism of open source, which might be that a scientist may modify open source code and, while meaning well, introduce a subtle math bug that adversely affects a refereed publication. I assume editorial rules preclude that for honest authors, i.e., most authors. If, hypothetically, you use FooBar version 0.61 to generate your publishable results, I think you would record the version number. If you improve the program, the appropriate thing to do is either to submit the patch to the maintainers of FooBar and, if they accept it, to download the new version from the maintainers' site, use that, and cite the new version, say, 0.62, or to create your own fork of FooBar, say, FooBarFab 0.1 and use and cite that, or, better, await other users' favorable opinions and endorsements of FooBarFab and then use and cite FooBarFab. I doubt an entomologist should use a homemade unbranded microscope to submit an article about gnats until that microscope has been recognized by peers as a trustworthy alternative to a known brand and model. Editors' standards should address this, either for what gets written into an article or, in the event of a post-publication challenge, what is kept in lab notes and other documentation that should be available to concerned peers seeking verification of published claims.

I didn't base my justifications on controlled experiments or look for them because I was raising a question about the need for research on the quality of black boxes, and that usually precedes the design of a study protocol. There probably are some studies on somewhat related questions, but I didn't come across any, didn't do a systematic search in someplace like JStor, and didn't expect to find any, since the relevant variables are likely in the thousands, likely leading to an indeterminate answer for this question. Specific studies, however, could help focus the next steps.

Fora tend to prefer brevity, especially problematic with a wide-ranging question. I've read advice from multiple sources on how to frame questions for fora, but actual experience usually doesn't support all of that advice, especially on adding more detail, at least in my experience across many fora, especially in computer topics, but also evident in some science sites.

Shouldn't computer math black boxes be shunned?

1. Shouldn't computer math black boxes be shunned?

2. Why are computer math black boxes controversial?

3. Can computer math black boxes be trusted?

4. Are there any benefits to using computer math black boxes?

5. How can we ensure transparency and accountability with computer math black boxes?

Similar threads

Hot Threads

Recent Insights