Risk Index for Shared Components

  • Thread starter Thread starter anorlunda
  • Start date Start date
  • Tags Tags
    Components Index
Click For Summary
The discussion centers on the risks associated with shared software components, particularly in light of instances where flaws in widely used libraries have led to significant consequences, such as invalidating thousands of scientific studies. A proposed formula for assessing risk is introduced, where risk is defined as the product of the probability of a flaw and the exposure of the component's use. The conversation highlights the importance of considering both the growth of exposure and the reliability of components over time, suggesting that quality improvement efforts often focus too narrowly on reducing the probability of flaws. The potential dangers of relying on a single software component, especially in critical applications like military systems, are emphasized, alongside the idea that diversity in software solutions may not always mitigate risks effectively. Ultimately, the discussion advocates for further research in computer science to better understand and manage these risks.
anorlunda
Staff Emeritus
Science Advisor
Homework Helper
Insights Author
Messages
11,326
Reaction score
8,754
In an earlier thread, Science Vulnerability to Bugs, I mentioned the case
http://catless.ncl.ac.uk/Risks/29.60.html said:
Faulty image analysis software may invalidate 40,000 fMRI studies
Here is a similar case.
http://catless.ncl.ac.uk/Risks/29/59#subj8 said:
Severe flaws in widely used open source library put many projects at risk
In another recent case (can't find the link), an author decided to un-license his public domain contribution and withdrew it from publicly shared libraries, which broke very many products dependent on it.

I am not a computer scientist, but I had a very computer science like follow-up thought on this subject.

We can define Risk=Probability*Exposure, let's say R=P*E. When we apply that to a software component, P is the probability of a flaw in the software and E is the number of places where the component is used.

Our confidence in a component increases with time and with the diversity of use without flaws being reported, thus P is a function of time t and of E. With time E may also grow, so that E is a function of time t. The interesting question is which outraces the other. Most quality improvement programs focus exclusively on P.

Given those, we could compute R(t). We should then be able to use ##\frac{dR}{dt}## as an index of the accepability of risk. ##\frac{dR}{dt}=0## is a likely choice for the boundary between acceptable and unacceptable risks. It also suggests that capping E is an alternative to lowering P as a remedy for excessive risk.

This is the kind of question that I expect should have been explored when discussion of shared reuable software components first became popular in the 1980s.

My question is, has this subject been explored in computer science? If yes, are there links?
 
  • Like
Likes Greg Bernhardt
Technology news on Phys.org
anorlunda said:
Most quality improvement programs focus exclusively on P.

Not sure if I understand your thinking well to say if I agree or disagree, but software is definitely released in stages of increasing E.

For example -

Unit test
Integration test at multiple levels
Handoff to test group for independant testing
Beta user testing, often multiple rounds with increasing E in each round
Production release
 
Grinkle said:
Not sure if I understand your thinking well to say if I agree or disagree, but software is definitely released in stages of increasing E.

The link to 40000 ruined scientific studies is a better example of what I was thinking of. I think it was a flaw in a statistical library that was at fault.

I'm thinking of an open source stat function initially distributed to E=100 users but it becomes popular and gains E=1000000 users. I see no sliding scale of quality that assures P*E is mandated to not grow as things go viral and E skyrockets. An function with ##10^6## times as many users, should be able to afford at least ##10^2## times larger QA budget.

Not putting all our eggs in one basket is another way to express it. A stat library used by nearly everyone in science may be too risky. Just for sake of diversity, there should be several competing packages from different origins that gain large shares of the users. I have heard of military applications where the government ordered instances of the same software written by different suppliers to reduce risk, but I never heard of this practice gaining wide acceptance.
 
I am not an academician, but fwiw, I offer my free-market perspective.

Standards are proposed and if adopted, may arguably improve the greater good. Many more standards are proposed than adopted by industry because individuals want to win, and most proposed standards involve some amount of change from status quo for many players, and some of the change is not perceived by some of the players as beneficial to them with respect to competitive advantage. It is often true that some players would be eliminated altogether if some standards where adopted, even though the standard is arguably better for the greater good.

If you were asked by a standards body to use software you personally thought was inferior to what you currently use now for the sake of global scientific risk reduction, would you agree? If you did not agree that the software you were being asked to use is actually functionally equivalent to what you are using now, would you agree? If you thought your research / publishing rival was influencing the standards committee to assign you the rubbish software, would you agree? etc.

I see a lot of human interaction factors involved in what you are suggesting.

edit:

Ignoring that, it is interesting to ponder modelling the comparative risk of deploying multiple less mature solutions vs fewer more mature solutions. Maturity comes from usage, and many software defect models predict that multiple less mature solutions present the user base with more total bugs than one single more mature solution. I think studies might need to be done on how many bugs are overlapping to see which is safer. If all solutions tend to suffer from similar symptom bugs, then many is (I think) by inspection worse than one. If bug symptoms are mostly unique to each individual implementation, then perhaps many can be better, even if total extant bugs are greater at any given point in time.
 
Last edited:
In Boost.Multiprecision one can select which one of several very different backends to use by simply changing one line of code and recompile.

While not foolproof it certainly allows one to be more confident in the results if they don't change significantly when changing backends.
 
Grinkle said:
This paper talks about Deployment metrics being an indicator of defect discovery rate, but it does not focus specifically on that aspect.

http://www.cs.cmu.edu/~paulluo/Papers/LiHerbslebShawISSRE.pdf

Thanks for the link. Yes, the paper looks at how P changes as the experience base grows. But it doesn't consider post deployment growth in E.

Lord Crc said:
In Boost.Multiprecision one can select which one of several very different backends to use by simply changing one line of code and recompile.

While not foolproof it certainly allows one to be more confident in the results if they don't change significantly when changing backends.
That sounds like a very intelligent way to handle multiple back ends, thus achieving some diversity. I wonder what motivated them to do that.

Grinkle said:
Ignoring that, it is interesting to ponder modelling the comparative risk of deploying multiple less mature solutions vs fewer more mature solutions. Maturity comes from usage, and many software defect models predict that multiple less mature solutions present the user base with more total bugs than one single more mature solution. I think studies might need to be done on how many bugs are overlapping to see which is safer.

Yes indeed. The point of my OP was not to advocate any remediations, but rather to advocate computer science research. Many in software engineering love the 20000 toot level look at the broad playing field, and big data from large number of projects.

I did think of one more scenario that paints the concern graphically. Suppose that we learned that a significant fraction of US military weapons systems were found to depend on a single software component; even though confidence in that component was extremely high. How much concern should that cause?

From a historical viewpoint, the Y2K bug comes to mind. The reason that Y2K caused so much concern, and so much diversion of resources to check for the bug, was precisely because it was so ubiquitous and cut across boundaries that we had believed made systems independent of each other. Y2K was an unsuspected common dependency.
 
anorlunda said:
Y2K was an unsuspected common dependency.

If diversity of solution implementation is a proposed robustness improvement, then Y2K strikes me as a counterexample to that proposal. Diversity of implementation did not offer any robustness. It is an example of many different implementations all potentially containing different coding bugs that lead to the same symptom (that symptom being inability to differentiate between different centuries when the century changes). In this case, if there were only a single date calculation algorithm that all the Earth's software used, it would have been a trivial fix. It was exactly the diversity of implementation in date calculation approaches that caused the concern. Each implementation needed examining and fixing.

Can you draw a couple speculative graphs of how you envision a scenario where P*E is helped by multiple implementations over time and a scenario where it is not? P obviously increases with each independent implementation, and the rate of maturation of each implementation decreases as the user base is diluted. I think you were saying this in your OP?
 
Grinkle said:
Y2K strikes me as a counterexample
The problem is dependencies on common things. In risk analysis we use the term, "common-mode failures." In the case of Y2K, the commonality was not a shared component, bur rather a shared method of expressing dates.

Grinkle said:
Can you draw a couple speculative graphs of how you envision a scenario where P*E is helped by multiple implementations over time and a scenario where it is not?
Consider the military example. Let's assume dependency on a software routine critical to 100% of USA's weapons. The threat is that an enemy cyberwar unit discovers a vulnerability in that common thing. They could knock out 100% of our weapons at once. If we had the diversity of two independent implementations of that thing, then their vulnerabilities would differ. The enemy would likely be limited to knocking out 50%^ of our weapons on one occasion. Losing 50% at a time on two occasions is much less scary than loosing 100% simultaneously..
 
  • #10
anorlunda said:
a shared method of expressing dates.

No method of date storage or expression was shared. The methods were arrived at independently by independent code developers, and they all suffered from defects that depended on the specific implementation but potentially showed a common symptom. Shared vs independently arrived at is a key distinction and very relevant to your thesis. It goes to why I am saying Y2K is counterexample to the argument that using different implementations for the same application will reduce the risk of being impacted by a code defect.
 

Similar threads

  • · Replies 5 ·
Replies
5
Views
1K
Replies
13
Views
2K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 0 ·
Replies
0
Views
2K
  • · Replies 1 ·
Replies
1
Views
1K
  • · Replies 3 ·
Replies
3
Views
1K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 3 ·
Replies
3
Views
2K