Risk Index for Shared Components

anorlunda · Sep 14, 2016

In an earlier thread, Science Vulnerability to Bugs, I mentioned the case

http://catless.ncl.ac.uk/Risks/29.60.html said:

Faulty image analysis software may invalidate 40,000 fMRI studies

Here is a similar case.

http://catless.ncl.ac.uk/Risks/29/59#subj8 said:

Severe flaws in widely used open source library put many projects at risk

In another recent case (can't find the link), an author decided to un-license his public domain contribution and withdrew it from publicly shared libraries, which broke very many products dependent on it.

I am not a computer scientist, but I had a very computer science like follow-up thought on this subject.

We can define Risk=Probability*Exposure, let's say R=P*E. When we apply that to a software component, P is the probability of a flaw in the software and E is the number of places where the component is used.

Our confidence in a component increases with time and with the diversity of use without flaws being reported, thus P is a function of time t and of E. With time E may also grow, so that E is a function of time t. The interesting question is which outraces the other. Most quality improvement programs focus exclusively on P.

Given those, we could compute R(t). We should then be able to use ##\frac{dR}{dt}## as an index of the accepability of risk. ##\frac{dR}{dt}=0## is a likely choice for the boundary between acceptable and unacceptable risks. It also suggests that capping E is an alternative to lowering P as a remedy for excessive risk.

This is the kind of question that I expect should have been explored when discussion of shared reuable software components first became popular in the 1980s.

My question is, has this subject been explored in computer science? If yes, are there links?

Grinkle · Sep 15, 2016

anorlunda said:

Most quality improvement programs focus exclusively on P.

Not sure if I understand your thinking well to say if I agree or disagree, but software is definitely released in stages of increasing E.

For example -

Unit test
Integration test at multiple levels
Handoff to test group for independent testing
Beta user testing, often multiple rounds with increasing E in each round
Production release

Grinkle · Sep 15, 2016

This paper talks about Deployment metrics being an indicator of defect discovery rate, but it does not focus specifically on that aspect.

http://www.cs.cmu.edu/~paulluo/Papers/LiHerbslebShawISSRE.pdf

anorlunda · Sep 15, 2016

Grinkle said:

Not sure if I understand your thinking well to say if I agree or disagree, but software is definitely released in stages of increasing E.

The link to 40000 ruined scientific studies is a better example of what I was thinking of. I think it was a flaw in a statistical library that was at fault.

I'm thinking of an open source stat function initially distributed to E=100 users but it becomes popular and gains E=1000000 users. I see no sliding scale of quality that assures P*E is mandated to not grow as things go viral and E skyrockets. An function with ##10^6## times as many users, should be able to afford at least ##10^2## times larger QA budget.

Not putting all our eggs in one basket is another way to express it. A stat library used by nearly everyone in science may be too risky. Just for sake of diversity, there should be several competing packages from different origins that gain large shares of the users. I have heard of military applications where the government ordered instances of the same software written by different suppliers to reduce risk, but I never heard of this practice gaining wide acceptance.

Grinkle · Sep 15, 2016

I am not an academician, but fwiw, I offer my free-market perspective.

Standards are proposed and if adopted, may arguably improve the greater good. Many more standards are proposed than adopted by industry because individuals want to win, and most proposed standards involve some amount of change from status quo for many players, and some of the change is not perceived by some of the players as beneficial to them with respect to competitive advantage. It is often true that some players would be eliminated altogether if some standards where adopted, even though the standard is arguably better for the greater good.

If you were asked by a standards body to use software you personally thought was inferior to what you currently use now for the sake of global scientific risk reduction, would you agree? If you did not agree that the software you were being asked to use is actually functionally equivalent to what you are using now, would you agree? If you thought your research / publishing rival was influencing the standards committee to assign you the rubbish software, would you agree? etc.

I see a lot of human interaction factors involved in what you are suggesting.

edit:

Ignoring that, it is interesting to ponder modelling the comparative risk of deploying multiple less mature solutions vs fewer more mature solutions. Maturity comes from usage, and many software defect models predict that multiple less mature solutions present the user base with more total bugs than one single more mature solution. I think studies might need to be done on how many bugs are overlapping to see which is safer. If all solutions tend to suffer from similar symptom bugs, then many is (I think) by inspection worse than one. If bug symptoms are mostly unique to each individual implementation, then perhaps many can be better, even if total extant bugs are greater at any given point in time.

Lord Crc · Sep 15, 2016

In Boost.Multiprecision one can select which one of several very different backends to use by simply changing one line of code and recompile.

While not foolproof it certainly allows one to be more confident in the results if they don't change significantly when changing backends.

anorlunda · Sep 16, 2016

Grinkle said:

This paper talks about Deployment metrics being an indicator of defect discovery rate, but it does not focus specifically on that aspect.

http://www.cs.cmu.edu/~paulluo/Papers/LiHerbslebShawISSRE.pdf

Thanks for the link. Yes, the paper looks at how P changes as the experience base grows. But it doesn't consider post deployment growth in E.

Lord Crc said:

In Boost.Multiprecision one can select which one of several very different backends to use by simply changing one line of code and recompile.

While not foolproof it certainly allows one to be more confident in the results if they don't change significantly when changing backends.

That sounds like a very intelligent way to handle multiple back ends, thus achieving some diversity. I wonder what motivated them to do that.

Grinkle said:

Ignoring that, it is interesting to ponder modelling the comparative risk of deploying multiple less mature solutions vs fewer more mature solutions. Maturity comes from usage, and many software defect models predict that multiple less mature solutions present the user base with more total bugs than one single more mature solution. I think studies might need to be done on how many bugs are overlapping to see which is safer.

Yes indeed. The point of my OP was not to advocate any remediations, but rather to advocate computer science research. Many in software engineering love the 20000 toot level look at the broad playing field, and big data from large number of projects.

I did think of one more scenario that paints the concern graphically. Suppose that we learned that a significant fraction of US military weapons systems were found to depend on a single software component; even though confidence in that component was extremely high. How much concern should that cause?

From a historical viewpoint, the Y2K bug comes to mind. The reason that Y2K caused so much concern, and so much diversion of resources to check for the bug, was precisely because it was so ubiquitous and cut across boundaries that we had believed made systems independent of each other. Y2K was an unsuspected common dependency.

Grinkle · Sep 16, 2016

anorlunda said:

Y2K was an unsuspected common dependency.

If diversity of solution implementation is a proposed robustness improvement, then Y2K strikes me as a counterexample to that proposal. Diversity of implementation did not offer any robustness. It is an example of many different implementations all potentially containing different coding bugs that lead to the same symptom (that symptom being inability to differentiate between different centuries when the century changes). In this case, if there were only a single date calculation algorithm that all the Earth's software used, it would have been a trivial fix. It was exactly the diversity of implementation in date calculation approaches that caused the concern. Each implementation needed examining and fixing.

Can you draw a couple speculative graphs of how you envision a scenario where P*E is helped by multiple implementations over time and a scenario where it is not? P obviously increases with each independent implementation, and the rate of maturation of each implementation decreases as the user base is diluted. I think you were saying this in your OP?

anorlunda · Sep 16, 2016

Grinkle said:

Y2K strikes me as a counterexample

The problem is dependencies on common things. In risk analysis we use the term, "common-mode failures." In the case of Y2K, the commonality was not a shared component, bur rather a shared method of expressing dates.

Grinkle said:

Can you draw a couple speculative graphs of how you envision a scenario where P*E is helped by multiple implementations over time and a scenario where it is not?

Consider the military example. Let's assume dependency on a software routine critical to 100% of USA's weapons. The threat is that an enemy cyberwar unit discovers a vulnerability in that common thing. They could knock out 100% of our weapons at once. If we had the diversity of two independent implementations of that thing, then their vulnerabilities would differ. The enemy would likely be limited to knocking out 50%^ of our weapons on one occasion. Losing 50% at a time on two occasions is much less scary than loosing 100% simultaneously..

Grinkle · Sep 16, 2016

anorlunda said:

a shared method of expressing dates.

No method of date storage or expression was shared. The methods were arrived at independently by independent code developers, and they all suffered from defects that depended on the specific implementation but potentially showed a common symptom. Shared vs independently arrived at is a key distinction and very relevant to your thesis. It goes to why I am saying Y2K is counterexample to the argument that using different implementations for the same application will reduce the risk of being impacted by a code defect.

Risk Index for Shared Components

Is A.I. more than the sum of its parts?

AI vs. Humans as Processors in an Environment

Sweetspot of data compression

Other than just FizzBuzz to test programmer candidates

How to show RS(U+TRS)* is equivalent to (R+SUT)SU?

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Risk Index for Shared Components

Similar threads