# A way to avoid bugs?

1. Jul 26, 2014

### anorlunda

This post is 100% speculative. I am struck by the seeming contradiction between two seeming facts.

1) In software circles we consider it a truism that any sizable complex software system can never be 100% debugged. Even with infinite time and effort it seems preposterous to claim that all the bugs could be found. Think Linux, a word processor, an avionics app, a web browser, a health care exchange; anything non-trivial. If Microsoft had the policy in 1981 of perfecting MSDOS 1.1 before moving ahead, we would still be stuck there.

2) In hardware circles we consider apparent perfection the norm. I'm thinking of devices such as CPUs. I remember a time in the 90s when bugs in the Pentium chips caused lots of headlines. To me that proves how rare it is if discovery of a bug causes headlines. Yet devices like CPUs are merely software programmed with the alphabet of resistors, transistors, wires, capacitors, and so on. They are apparently completely debugged using simulations that are analogous to interpreters of programs written in this circuit alphabet language.

--

So here's the speculation. If we restricted ourselves to a very simple "machine language" alphabet analogous to resistors, transistors, wires, capacitors, and so on, could we design large and complex software systems that are much more bug free than traditional software?

In other words, I'm suggesting that the richness of expression in our programming languages is what leads to our inability to perfect the applications. It is interesting to note that a language of resistors/transistors ... has no jump operation, no arrays, no labels, although the language can be used to create a CPU that has all these things.

See what I'm saying? We can perfect software that implements a CPU (or some other chip), but we can't perfect software that the CPU (or chip) executes. If not the programming paradigm that separates the two, then what?

2. Jul 26, 2014

### Chronos

Bug free software is actually not a priority. Pirated software will copy the bugs along with the code, which is like a fingerprint in a copyright infringement case.

3. Jul 27, 2014

### Bill Simpson

I claim the answer to the question you posed is "no."

People feel compelled to introduce more and more complicated products. Google for "lines of code" windows and you will see that recent versions of Microsoft products are estimated to contain 40 million lines of code or more each. If simple hardware were the answer to bug free perfection then just imagine implementing ALL of Windows 7 or Office 2013 built entirely from 2 input Nand gates, not creating a computer out of Nand gates and then writing software for it, but creating every menu, every font, every color, handling every button press, implementing every detail of those 40+ million lines of code out of those Nand gates. That seems roughly what you are suggesting by supposing a programming language which is no more complicated than a collection of Nand gates. (Half a century ago or more ago there actually was a computer built completely from hardware, the operating system, the editor, the file system, etc. were all completely implemented in hardware, but that was many lifetimes ago in computer years.) In companies that that started from hardware you could suggest that some new complicated feature be implemented entirely in hardware. This would almost certainly be instantly rejected because "that would be far to complicated to even imagine doing in hardware", so they toss it over the wall to the software guys, assume this will solve the problem, no matter how complicated, grudgingly admit that software will always be a stream of mistakes, assume that software is "sort of free", but grudgingly admit that software is too expensive and too late no matter what the task is, all at the same time.

There are books out there, "Error Free Software" by Baber is one, but that is old, which describe how for centuries people built bridges and buildings which routinely fell down later. Then design principles were introduced which enabled most buildings and bridges to just not fall down. He claims the same might be done for software. The problem with trying to do that today is that you are almost certainly not going to start from scratch and implement you own error free operating system, all you own various error free application libraries, all your own error free world wide networking software, all your own error free operating system and software for users to be able to access your great new idea. Instead you are going to build your house of cards on top of a terrifying layer after layer after layer of hopefully mostly sort of debugged libraries and products and code that you have about zero control over.

It is possible to make a software product that is a hundred or a thousand times more dependable than the typical stuff out there. Consider for a minute that big successful profitable Adobe has had almost monthly, sometimes even weekly, massive security screw-ups in something as simple as Adobe Reader or Adobe Flash and that this has happened every month or every week for fifteen or more years! There is even a comedian who has a bit about "imagine if you had the energy of the Adobe updater." There are also a few products, very complicated products, out there which a collection of users hammer on hard every day, year after year and those users consider themselves very good if they find a single small error each year, and those errors are immediately correctly fixed in the next update.

Many users and I think almost all companies almost gave up caring whether something is error free about the time that MS Windows 3 became popular. "Who cares if it works, another thousand features is what people want and what matters."

Last edited: Jul 27, 2014
4. Jul 30, 2014

### neomahakala108

as far as i understand the topic (i've debugged software professionally):

1. reality changes with time, requiring software to adjust - introducing idea of bugfixes in design, which carry on to bugfixes in implementation.
2. once we fix all critical bugs, we go to higher level of abstraction to fix bugs occuring there for as long as we have time, before 'reality changes again', more or less.

5. Jul 30, 2014

### rcgldr

This would be rare. Most of the time, when using design languages such as VHDL, test scripts are made that can get into sections of the design that could not be tested with the actual pins on a chip. One goal is to get each state machine in the design to cycle through all of it's states, but it's not possible to test all combinations of states in all the state machines. During the design phase, getting overall "coverage" between 90% to 95% is common, trying to go much beyond 95% is difficult and usually only done for mission crictical devices with a fairly simply architecture, like a pace maker. Afterwards, test scripts are run to drive the pins on a chip, and some chips may have additional debug only pins to help with verification.

For software, similar concepts can be used. Unit testing of all functions rather than trying to test an entire program via the equivalent of scripts. Some profiler like tools can be used to test that a script that ensures every line of code in function is used (all possible branches taken) at least once when running a script test on the function or debugger build of a program. Again, one criteria for software if it's part of a mission critical environment, and if so, much more thorough testing of the software is done.

That Pentium Pro division bug was a problem with a few (maybe just two?) entries in a table used to optimize division. Trivia note - in a Simpsons halloween episode around that time, France was insulted by the USA and fired their one and only nuclear missle at Washington DC. As the missle clears the silo, you see the "Intel inside" logo on the missle, and of course, the missle goes off track and detonates over Springfield. Fortunately for the Simpsons, Homer just happens to be in a bomb shelter he's considering buying, and the rest of the family is protected by all the lead based paint in their home.

Last edited: Jul 30, 2014
6. Jul 30, 2014

### jbunniii

A software manager once told me: "if someone writes even one line of software, most likely there is at least one bug in it." A bit of an exaggeration but not all that far from the truth in my experience.

7. Jul 31, 2014

### borisgred

Perhaps this difference is at least in part due to the fact that consumers are more willing to tolerate faults in software than in hardware. If the people were more demanding and discriminating, that would put competitive pressure on the software developers.

The causes of this leniency are surely complicated, but I think software is generally more a more fault-tolerant component of the overall system, and it's easier to fix it with an update later. Fixing hardware requires replacing the hardware, and physical labor is resource- and time-intensive. Applying a software patch is easy in comparison.

If you think the richness of programming languages is contributing to faults in development, perhaps we could switch to coding in whitespace?

8. Jul 31, 2014

### Borg

I'll add my two bits. It takes many programmers to develop modern software and some are better than others.

Programmers are people with varying skills and attitudes. There are programmers of all types from contentious and skilled to lazy and incompetent. Fortunately, the latter are the exception.

Coding errors can also be due to management decisions such as unreasonable deadlines or poor coordination of resources.

Once an error gets into the system, It can be very difficult to remove them. Will the person fixing a bug notice that the bug affects more than just what was reported? Will management decide that the others aren't worth fixing? Will the customer decide to change the requirements so that whole sections of the code are now 'broken'?

9. Jul 31, 2014

### thankz

bug free software exists in mission critical software like the space shuttle or nuclear weapons. I'd expect those systems were coded in machine language. I'd expect in 50 years or so architectures will be universal and we'll go from bloat ware to efficiency and stability as the new standard.

10. Aug 7, 2014

### Bill Simpson

Even mission critical software isn't immune.

20 year old bug that went to mars several times before being accidentally found

11. Aug 8, 2014

### neomahakala108

this depends on patch & physical labor.

12. Aug 8, 2014

### neomahakala108

there are proofs of correctness for software, they are just expensive effort-, money-, time-, resources- wise, perhaps more.

in many situations there are just use cases & tests, automated or not (test security harness) for software that let users be comfortable with fairly high faultlessness.

i wrote articles that form basis for testing on my blog. it's Java-related & uses Object Oriented Programming.

1. Design by Contract: http://dragonfly-algorithm.blogspot.com/2013/11/design-by-contract.html
2. Conditions, Security Harness, Software Complexity: http://dragonfly-algorithm.blogspot.com/2014/07/conditions-security-harness-software.html

both, especially DbC help to form proper thinking required for tests in Java & other OOP technologies.

Last edited: Aug 8, 2014
13. Aug 8, 2014

### D H

Staff Emeritus
The Shuttle software was not bug free. The defect density was estimated at one defect per 400,000 lines of code. Compare with the industry standard for good quality software of one defect per 1,000 lines of code, or the typical 10-15 defects per 1,000 lines of code from a process-free organization.

That extremely low defect density came at an extremely high price, about $1000/line of code, and that was in 1995 dollars. In today's dollar, that would be about$1600/LOC. Assuming a fully-loaded cost of $100/hour (2014 dollars), that means a programmer productivity of one delivered line of code every other day. Few programmers make$100/hour; that means a salary of about $200,000/year. However, I said fully-loaded cost, and that$1000/LOC figure represents a fully-loaded cost. Fully-loaded means salary + benefits + overhead (employer's share of Social Security, equipment, office space, ...) + G&A + profit. If anything, $100/hour is on the low side for a fully-loaded cost. More modern estimates for safety-critical software using a modern language and modern tools are in the low hundreds of dollars per line of code, which corresponds to a productivity of one delivered line of code every hour or two. There are ways to exceed the Shuttle cost of$1600/LOC. Toyota, for example. They went cheap on their development and had little corporate knowledge of the many gotchas involved in writing multi-threaded safety-critical software. The initial development was likely less than the $100/LOC. However, if you factor in the 1.2 billion dollar criminal fine, the hundreds of millions of dollars in lawsuit settlements, and the costs of all those recalls, the Toyota ETCS software easily exceeds$2000/LOC.

Why would you expect that? Nobody codes in machine language nowadays. They might code in assembly, but even that is pretty rare.

The Shuttle flight software was written in a language called HAL/S, short for "High-order Assembly Language/Shuttle". However, HAL/S was anything but an assembly language. It supported structured programming (if/then/else, for and while loops, etc) and data structures, and it provided extensive support for mathematical programming. You can't write "y = Ax + b" directly in assembly when A is a matrix and x and b are vectors. You could do just that, and even more complex mathematical expressions, in HAL/S.

The Shuttle flight software was written in the 1980s. Since then, the Ada and C++ languages have appeared, along with even higher order programming tools such as Matlab/Simulink. Modern safety-critical systems are written in Ada, C++, C, and those higher order tools. Nobody uses machine language, and assembly, if it is used at all, is extremely limited.

14. Aug 8, 2014

### anorlunda

Thank you Mentor, the NASA data is helpful. Let me use another NASA example to focus on my original point.

A Mars Lander (forgot the name) crashed. Investigations traced the cause to a bug. A routine was called with a value in English units, but received and interpreted in metric units. My hypothesis is that if the probes functions were implemented 100% as hardware chips; and tested by chip industry standard practice, that the bug would have been eliminated in advance. Obviously, I can't prove that assertion, but I believe it true.

The question of the OP focuses on the seeming disparity in the bug rate in SW and HW given comparable complexity and cost.

One could obtain a LOC count from a hardware project by counting the number of lines of simulation code needed to make a simulator. Have you seen such counts and comparisons between HW and SW projects?

Also to clarify my OP, much software (perhaps most software) is not subject to this kind of analysis because people disagree about the difference between a feature and a bug. I'm reminded of Microsoft Excel debates where group A insisted that certain behavior of Excel was a bug while group B of users insisted that Excel behaved properly and that the A group was wrong. My interest is in the cases where the requirements are not debatable.

p.s. Just curious. Did the original shuttle requirements foresee the Y2K transition?

15. Aug 8, 2014

### neomahakala108

one of main benefits of High Level Programming is less errors & simpler software than the machine code... as long as all of software components won't fail. this includes Operating System, Software Libraries & other Dependencies if there are such.

16. Aug 8, 2014

### D H

Staff Emeritus
That was the Mars Climate Orbiter, and you have your facts wrong. Yes, it was a mismatch between US customary and metric units, but the problem was on the ground, not on the satellite. The satellite did exactly what it was told to do. The error resulted from the output of one program being fed into another. The first program's output was a Δv in customary units. The second program, which formulated the commands to be sent to the satellite, expected inputs in metric units. The output of the first program was just a number (rather, three numbers, the x, y and z components of the desired delta V). The input to the second program was just a bunch of numbers.

How is you hypothesis possibly going to address the units problem?

To be brutally honest, your hypothesis is nonsense. You appear to have completely missed the point as to why we build computers in the first place. The reason we build computers is because we can essentially write an infinite number of programs even though the computer's vocabulary is finite. This is the subject of the Church–Turing thesis.

The finite and rather simple vocabulary of the computer's instruction set means that each instruction can be tested to ensure that it is doing exactly what was intended. Testing whether a program written using that vocabulary behaves as expected is an entirely different matter from testing whether the instruction set, memory, and associated circuitry behave as expected.

17. Aug 8, 2014

### .Scott

In some cases, the programming language is limited in hopes of making mission-critical or life-dependent code more reliable. The most common example is the use of "ladder logic" in programming sensors and relays. Automated industrial machinery is often done this way. The effect is to limit the complexity and make the functionality graphically apparent and easy to review.

But, in contrast to your suggestion, the most application-specific and complex operations are the ones often implemented in software - because at a certain level of customization and complexity, it's the only practical alternative.

Perhaps the biggest differences in hardware and software are:
1) Hardware circuits are more commonly "unit tested". And the "units" tend to be build up in more gradual steps. This allows errors to be identified at an earlier stage - long before thousands of components are made only to be discarded.
2) In most cases, there are only a few readily identified and measures "global resources" in even a complicated hardware circuit. Things like voltage supplies, grounds, clock signals, thermal management, etc. If any one component messes one of these up, it will effect the entire circuit, but it will also be easy to determine what is basically going wrong. With software, mess up a single library function or the program counter and the entire application is hosed. So the basic architecture of the software is more vulnerable to hidden faults with catastrophic effects.

Of course, a mission-critical system with software also includes hardware - and the system must be tolerant of both hardware and software failures. The design of such systems usually relies on the software to monitor all of the components of the system - even when a component is not currently in use. After all, redundancy only counts if you do something about the first failure before a second failure.

Mission-critical software only needs to meet the reliability requirements established for it in the overall design of the mission-critical system. As DH suggested, this is generally something short of perfection but well better than common code.

18. Aug 9, 2014

### anorlunda

My hypothesis is perhaps best suited for academic investigation such as a thesis subject.

1) Determine if some domains (such as chip design) produce statistically significantly more-bug free functionality at lower prices than other domains. Bugs per thousand LOC, and $per LOC could be the metrics. 2) If (1) is yes, then try to figure out why the difference. (My speculation regarding the richness of the language and testing methods may indeed be nonsense but it could be one possibility to investigate.) 3) Try to apply the lessons from (2) into other domains. If successful, it is a win win. 19. Aug 9, 2014 ### Borg You're still missing the point. You see that CPUs have fewer bugs and assume it has something to do with the way that it's 'written' (CPUs don't have lines of code - they are arrangements of circuits). There is a reason why the CPU has to have fewer bugs - because it is in use by every computer. A bug in a CPU will affect everyone. Software on the other hand is only in use by a subset of all computers. A failure in a software program doesn't affect all computers. Try thinking of a CPU manufacturer as a company building a simple light switch. It has a simple, easily tested function - does the switch turn the power on and off? If you build it wrong, nobody will buy your switch. Now compare that to the software programs which are the humans who use the switch. If I tell someone to turn off a light before going to bed, there are a lot of potential variables that could impact whether the light gets turned off properly. Does the person speak the same language as in the Mars unit of measure example? Does the person not like you and will purposely do the opposite? I compare that one with Microsoft's attempt to subvert Java with its J++ software in the late 90's. :tongue: Maybe the switch was installed on the ceiling where it can't be reached. I could go on but the point is that none of these "software bugs" would really affect the light switch manufacturer that builds a functioning light switch. Most people would still have a functioning light switch that could be turned off. So ask yourself this - when you buy a light switch, would you rather pay 50 cents for the one that has been simply tested or would you pay$15 for the one that comes with educating everyone in its proper use just because there were a few oddballs that can't do it right?

20. Aug 9, 2014

### neomahakala108

if you wish to reduce LOC, then this can work for example with a Dedicated Programming Language or proper Software Library.