Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

A way to avoid bugs?

  1. Jul 26, 2014 #1

    anorlunda

    Staff: Mentor

    This post is 100% speculative. I am struck by the seeming contradiction between two seeming facts.

    1) In software circles we consider it a truism that any sizable complex software system can never be 100% debugged. Even with infinite time and effort it seems preposterous to claim that all the bugs could be found. Think Linux, a word processor, an avionics app, a web browser, a health care exchange; anything non-trivial. If Microsoft had the policy in 1981 of perfecting MSDOS 1.1 before moving ahead, we would still be stuck there.

    2) In hardware circles we consider apparent perfection the norm. I'm thinking of devices such as CPUs. I remember a time in the 90s when bugs in the Pentium chips caused lots of headlines. To me that proves how rare it is if discovery of a bug causes headlines. Yet devices like CPUs are merely software programmed with the alphabet of resistors, transistors, wires, capacitors, and so on. They are apparently completely debugged using simulations that are analogous to interpreters of programs written in this circuit alphabet language.

    --

    So here's the speculation. If we restricted ourselves to a very simple "machine language" alphabet analogous to resistors, transistors, wires, capacitors, and so on, could we design large and complex software systems that are much more bug free than traditional software?

    In other words, I'm suggesting that the richness of expression in our programming languages is what leads to our inability to perfect the applications. It is interesting to note that a language of resistors/transistors ... has no jump operation, no arrays, no labels, although the language can be used to create a CPU that has all these things.

    See what I'm saying? We can perfect software that implements a CPU (or some other chip), but we can't perfect software that the CPU (or chip) executes. If not the programming paradigm that separates the two, then what?
     
  2. jcsd
  3. Jul 26, 2014 #2

    Chronos

    User Avatar
    Science Advisor
    Gold Member

    Bug free software is actually not a priority. Pirated software will copy the bugs along with the code, which is like a fingerprint in a copyright infringement case.
     
  4. Jul 27, 2014 #3
    I claim the answer to the question you posed is "no."

    People feel compelled to introduce more and more complicated products. Google for "lines of code" windows and you will see that recent versions of Microsoft products are estimated to contain 40 million lines of code or more each. If simple hardware were the answer to bug free perfection then just imagine implementing ALL of Windows 7 or Office 2013 built entirely from 2 input Nand gates, not creating a computer out of Nand gates and then writing software for it, but creating every menu, every font, every color, handling every button press, implementing every detail of those 40+ million lines of code out of those Nand gates. That seems roughly what you are suggesting by supposing a programming language which is no more complicated than a collection of Nand gates. (Half a century ago or more ago there actually was a computer built completely from hardware, the operating system, the editor, the file system, etc. were all completely implemented in hardware, but that was many lifetimes ago in computer years.) In companies that that started from hardware you could suggest that some new complicated feature be implemented entirely in hardware. This would almost certainly be instantly rejected because "that would be far to complicated to even imagine doing in hardware", so they toss it over the wall to the software guys, assume this will solve the problem, no matter how complicated, grudgingly admit that software will always be a stream of mistakes, assume that software is "sort of free", but grudgingly admit that software is too expensive and too late no matter what the task is, all at the same time.

    There are books out there, "Error Free Software" by Baber is one, but that is old, which describe how for centuries people built bridges and buildings which routinely fell down later. Then design principles were introduced which enabled most buildings and bridges to just not fall down. He claims the same might be done for software. The problem with trying to do that today is that you are almost certainly not going to start from scratch and implement you own error free operating system, all you own various error free application libraries, all your own error free world wide networking software, all your own error free operating system and software for users to be able to access your great new idea. Instead you are going to build your house of cards on top of a terrifying layer after layer after layer of hopefully mostly sort of debugged libraries and products and code that you have about zero control over.

    It is possible to make a software product that is a hundred or a thousand times more dependable than the typical stuff out there. Consider for a minute that big successful profitable Adobe has had almost monthly, sometimes even weekly, massive security screw-ups in something as simple as Adobe Reader or Adobe Flash and that this has happened every month or every week for fifteen or more years! There is even a comedian who has a bit about "imagine if you had the energy of the Adobe updater." There are also a few products, very complicated products, out there which a collection of users hammer on hard every day, year after year and those users consider themselves very good if they find a single small error each year, and those errors are immediately correctly fixed in the next update.

    Many users and I think almost all companies almost gave up caring whether something is error free about the time that MS Windows 3 became popular. "Who cares if it works, another thousand features is what people want and what matters."
     
    Last edited: Jul 27, 2014
  5. Jul 30, 2014 #4

    neomahakala108

    User Avatar
    Gold Member

    as far as i understand the topic (i've debugged software professionally):

    1. reality changes with time, requiring software to adjust - introducing idea of bugfixes in design, which carry on to bugfixes in implementation.
    2. once we fix all critical bugs, we go to higher level of abstraction to fix bugs occuring there for as long as we have time, before 'reality changes again', more or less.
     
  6. Jul 30, 2014 #5

    rcgldr

    User Avatar
    Homework Helper

    This would be rare. Most of the time, when using design languages such as VHDL, test scripts are made that can get into sections of the design that could not be tested with the actual pins on a chip. One goal is to get each state machine in the design to cycle through all of it's states, but it's not possible to test all combinations of states in all the state machines. During the design phase, getting overall "coverage" between 90% to 95% is common, trying to go much beyond 95% is difficult and usually only done for mission crictical devices with a fairly simply architecture, like a pace maker. Afterwards, test scripts are run to drive the pins on a chip, and some chips may have additional debug only pins to help with verification.

    For software, similar concepts can be used. Unit testing of all functions rather than trying to test an entire program via the equivalent of scripts. Some profiler like tools can be used to test that a script that ensures every line of code in function is used (all possible branches taken) at least once when running a script test on the function or debugger build of a program. Again, one criteria for software if it's part of a mission critical environment, and if so, much more thorough testing of the software is done.

    That Pentium Pro division bug was a problem with a few (maybe just two?) entries in a table used to optimize division. Trivia note - in a Simpsons halloween episode around that time, France was insulted by the USA and fired their one and only nuclear missle at Washington DC. As the missle clears the silo, you see the "Intel inside" logo on the missle, and of course, the missle goes off track and detonates over Springfield. Fortunately for the Simpsons, Homer just happens to be in a bomb shelter he's considering buying, and the rest of the family is protected by all the lead based paint in their home.
     
    Last edited: Jul 30, 2014
  7. Jul 30, 2014 #6

    jbunniii

    User Avatar
    Science Advisor
    Homework Helper
    Gold Member

    A software manager once told me: "if someone writes even one line of software, most likely there is at least one bug in it." A bit of an exaggeration but not all that far from the truth in my experience.
     
  8. Jul 31, 2014 #7
    Perhaps this difference is at least in part due to the fact that consumers are more willing to tolerate faults in software than in hardware. If the people were more demanding and discriminating, that would put competitive pressure on the software developers.

    The causes of this leniency are surely complicated, but I think software is generally more a more fault-tolerant component of the overall system, and it's easier to fix it with an update later. Fixing hardware requires replacing the hardware, and physical labor is resource- and time-intensive. Applying a software patch is easy in comparison.

    If you think the richness of programming languages is contributing to faults in development, perhaps we could switch to coding in whitespace? :biggrin:
     
  9. Jul 31, 2014 #8

    Borg

    User Avatar
    Science Advisor
    Gold Member
    2017 Award

    I'll add my two bits. It takes many programmers to develop modern software and some are better than others.

    Programmers are people with varying skills and attitudes. There are programmers of all types from contentious and skilled to lazy and incompetent. Fortunately, the latter are the exception.

    Coding errors can also be due to management decisions such as unreasonable deadlines or poor coordination of resources.

    Once an error gets into the system, It can be very difficult to remove them. Will the person fixing a bug notice that the bug affects more than just what was reported? Will management decide that the others aren't worth fixing? Will the customer decide to change the requirements so that whole sections of the code are now 'broken'?
     
  10. Jul 31, 2014 #9
    bug free software exists in mission critical software like the space shuttle or nuclear weapons. I'd expect those systems were coded in machine language. I'd expect in 50 years or so architectures will be universal and we'll go from bloat ware to efficiency and stability as the new standard.
     
  11. Aug 7, 2014 #10
    Even mission critical software isn't immune.

    20 year old bug that went to mars several times before being accidentally found
     
  12. Aug 8, 2014 #11

    neomahakala108

    User Avatar
    Gold Member

    this depends on patch & physical labor.
     
  13. Aug 8, 2014 #12

    neomahakala108

    User Avatar
    Gold Member

    there are proofs of correctness for software, they are just expensive effort-, money-, time-, resources- wise, perhaps more.

    in many situations there are just use cases & tests, automated or not (test security harness) for software that let users be comfortable with fairly high faultlessness.

    i wrote articles that form basis for testing on my blog. it's Java-related & uses Object Oriented Programming.

    1. Design by Contract: http://dragonfly-algorithm.blogspot.com/2013/11/design-by-contract.html
    2. Conditions, Security Harness, Software Complexity: http://dragonfly-algorithm.blogspot.com/2014/07/conditions-security-harness-software.html

    both, especially DbC help to form proper thinking required for tests in Java & other OOP technologies.
     
    Last edited: Aug 8, 2014
  14. Aug 8, 2014 #13

    D H

    User Avatar
    Staff Emeritus
    Science Advisor

    The Shuttle software was not bug free. The defect density was estimated at one defect per 400,000 lines of code. Compare with the industry standard for good quality software of one defect per 1,000 lines of code, or the typical 10-15 defects per 1,000 lines of code from a process-free organization.

    That extremely low defect density came at an extremely high price, about $1000/line of code, and that was in 1995 dollars. In today's dollar, that would be about $1600/LOC. Assuming a fully-loaded cost of $100/hour (2014 dollars), that means a programmer productivity of one delivered line of code every other day. Few programmers make $100/hour; that means a salary of about $200,000/year. However, I said fully-loaded cost, and that $1000/LOC figure represents a fully-loaded cost. Fully-loaded means salary + benefits + overhead (employer's share of Social Security, equipment, office space, ...) + G&A + profit. If anything, $100/hour is on the low side for a fully-loaded cost.

    More modern estimates for safety-critical software using a modern language and modern tools are in the low hundreds of dollars per line of code, which corresponds to a productivity of one delivered line of code every hour or two. There are ways to exceed the Shuttle cost of $1600/LOC. Toyota, for example. They went cheap on their development and had little corporate knowledge of the many gotchas involved in writing multi-threaded safety-critical software. The initial development was likely less than the $100/LOC. However, if you factor in the 1.2 billion dollar criminal fine, the hundreds of millions of dollars in lawsuit settlements, and the costs of all those recalls, the Toyota ETCS software easily exceeds $2000/LOC.

    Why would you expect that? Nobody codes in machine language nowadays. They might code in assembly, but even that is pretty rare.

    The Shuttle flight software was written in a language called HAL/S, short for "High-order Assembly Language/Shuttle". However, HAL/S was anything but an assembly language. It supported structured programming (if/then/else, for and while loops, etc) and data structures, and it provided extensive support for mathematical programming. You can't write "y = Ax + b" directly in assembly when A is a matrix and x and b are vectors. You could do just that, and even more complex mathematical expressions, in HAL/S.

    The Shuttle flight software was written in the 1980s. Since then, the Ada and C++ languages have appeared, along with even higher order programming tools such as Matlab/Simulink. Modern safety-critical systems are written in Ada, C++, C, and those higher order tools. Nobody uses machine language, and assembly, if it is used at all, is extremely limited.
     
  15. Aug 8, 2014 #14

    anorlunda

    Staff: Mentor

    Thank you Mentor, the NASA data is helpful. Let me use another NASA example to focus on my original point.

    A Mars Lander (forgot the name) crashed. Investigations traced the cause to a bug. A routine was called with a value in English units, but received and interpreted in metric units. My hypothesis is that if the probes functions were implemented 100% as hardware chips; and tested by chip industry standard practice, that the bug would have been eliminated in advance. Obviously, I can't prove that assertion, but I believe it true.

    The question of the OP focuses on the seeming disparity in the bug rate in SW and HW given comparable complexity and cost.

    One could obtain a LOC count from a hardware project by counting the number of lines of simulation code needed to make a simulator. Have you seen such counts and comparisons between HW and SW projects?

    Also to clarify my OP, much software (perhaps most software) is not subject to this kind of analysis because people disagree about the difference between a feature and a bug. I'm reminded of Microsoft Excel debates where group A insisted that certain behavior of Excel was a bug while group B of users insisted that Excel behaved properly and that the A group was wrong. My interest is in the cases where the requirements are not debatable.

    p.s. Just curious. Did the original shuttle requirements foresee the Y2K transition?
     
  16. Aug 8, 2014 #15

    neomahakala108

    User Avatar
    Gold Member

    one of main benefits of High Level Programming is less errors & simpler software than the machine code... as long as all of software components won't fail. this includes Operating System, Software Libraries & other Dependencies if there are such.
     
  17. Aug 8, 2014 #16

    D H

    User Avatar
    Staff Emeritus
    Science Advisor

    That was the Mars Climate Orbiter, and you have your facts wrong. Yes, it was a mismatch between US customary and metric units, but the problem was on the ground, not on the satellite. The satellite did exactly what it was told to do. The error resulted from the output of one program being fed into another. The first program's output was a Δv in customary units. The second program, which formulated the commands to be sent to the satellite, expected inputs in metric units. The output of the first program was just a number (rather, three numbers, the x, y and z components of the desired delta V). The input to the second program was just a bunch of numbers.

    How is you hypothesis possibly going to address the units problem?

    To be brutally honest, your hypothesis is nonsense. You appear to have completely missed the point as to why we build computers in the first place. The reason we build computers is because we can essentially write an infinite number of programs even though the computer's vocabulary is finite. This is the subject of the Church–Turing thesis.

    The finite and rather simple vocabulary of the computer's instruction set means that each instruction can be tested to ensure that it is doing exactly what was intended. Testing whether a program written using that vocabulary behaves as expected is an entirely different matter from testing whether the instruction set, memory, and associated circuitry behave as expected.
     
  18. Aug 8, 2014 #17
    In some cases, the programming language is limited in hopes of making mission-critical or life-dependent code more reliable. The most common example is the use of "ladder logic" in programming sensors and relays. Automated industrial machinery is often done this way. The effect is to limit the complexity and make the functionality graphically apparent and easy to review.

    But, in contrast to your suggestion, the most application-specific and complex operations are the ones often implemented in software - because at a certain level of customization and complexity, it's the only practical alternative.

    Perhaps the biggest differences in hardware and software are:
    1) Hardware circuits are more commonly "unit tested". And the "units" tend to be build up in more gradual steps. This allows errors to be identified at an earlier stage - long before thousands of components are made only to be discarded.
    2) In most cases, there are only a few readily identified and measures "global resources" in even a complicated hardware circuit. Things like voltage supplies, grounds, clock signals, thermal management, etc. If any one component messes one of these up, it will effect the entire circuit, but it will also be easy to determine what is basically going wrong. With software, mess up a single library function or the program counter and the entire application is hosed. So the basic architecture of the software is more vulnerable to hidden faults with catastrophic effects.

    Of course, a mission-critical system with software also includes hardware - and the system must be tolerant of both hardware and software failures. The design of such systems usually relies on the software to monitor all of the components of the system - even when a component is not currently in use. After all, redundancy only counts if you do something about the first failure before a second failure.

    Mission-critical software only needs to meet the reliability requirements established for it in the overall design of the mission-critical system. As DH suggested, this is generally something short of perfection but well better than common code.
     
  19. Aug 9, 2014 #18

    anorlunda

    Staff: Mentor

    My hypothesis is perhaps best suited for academic investigation such as a thesis subject.

    1) Determine if some domains (such as chip design) produce statistically significantly more-bug free functionality at lower prices than other domains. Bugs per thousand LOC, and $ per LOC could be the metrics.

    2) If (1) is yes, then try to figure out why the difference. (My speculation regarding the richness of the language and testing methods may indeed be nonsense but it could be one possibility to investigate.)

    3) Try to apply the lessons from (2) into other domains. If successful, it is a win win.
     
  20. Aug 9, 2014 #19

    Borg

    User Avatar
    Science Advisor
    Gold Member
    2017 Award

    You're still missing the point. You see that CPUs have fewer bugs and assume it has something to do with the way that it's 'written' (CPUs don't have lines of code - they are arrangements of circuits).

    There is a reason why the CPU has to have fewer bugs - because it is in use by every computer. A bug in a CPU will affect everyone. Software on the other hand is only in use by a subset of all computers. A failure in a software program doesn't affect all computers.

    Try thinking of a CPU manufacturer as a company building a simple light switch. It has a simple, easily tested function - does the switch turn the power on and off? If you build it wrong, nobody will buy your switch.

    Now compare that to the software programs which are the humans who use the switch. If I tell someone to turn off a light before going to bed, there are a lot of potential variables that could impact whether the light gets turned off properly. Does the person speak the same language as in the Mars unit of measure example? Does the person not like you and will purposely do the opposite? I compare that one with Microsoft's attempt to subvert Java with its J++ software in the late 90's. :tongue: Maybe the switch was installed on the ceiling where it can't be reached. I could go on but the point is that none of these "software bugs" would really affect the light switch manufacturer that builds a functioning light switch. Most people would still have a functioning light switch that could be turned off.

    So ask yourself this - when you buy a light switch, would you rather pay 50 cents for the one that has been simply tested or would you pay $15 for the one that comes with educating everyone in its proper use just because there were a few oddballs that can't do it right?
     
  21. Aug 9, 2014 #20

    neomahakala108

    User Avatar
    Gold Member

    if you wish to reduce LOC, then this can work for example with a Dedicated Programming Language or proper Software Library.
     
  22. Aug 9, 2014 #21

    anorlunda

    Staff: Mentor

    Now I see why we disagree. I do indeed think that bugs are mostly a function of coding methods and testing methods rather than the application. You seem to think CPUs have fewer bugs because the designers wish it; you say " because it is in use by every computer."

    Think of it this way. A prerequisite to getting a bug free chip is getting a bug free simulation of that chip in an environment where tests can be run. It could be the circuit simulator Spice. Spice is the language interpreter, and the circuit simulated is the program. The program need not implement a CPU, it could be countless other embedded applications (that don't have human interfaces). Just because the elements are transistors, resistors and wires, doesn't mean that we can't implement any arbitrary function (the Turing machine demonstrates that).

    For example, electronic car control (think of Toyota's [sticky] throttle controller). Suppose a program needing 16 million transistors is equivalent in complexity to 1.6 million LOC in a high level language (just guessing at the the factor of 10). In standard software practice an app with 1.6 million LOC would have hundreds or thousands of bugs. If we could reduce that by a factor of 10, it would be a great application. Remember that widespread driverless vehicles are supposedly coming soon. They need very complex, and very safe logic.

    How would it be reduced? First, by using a lower level language. Scott was on the right track when he mentioned ladder logic used in PLCs (programmable logic controllers). Pure ladder logic has no subroutines, no arguments, no loops, no lists, no choices of sequential/parallel operation. Second, by making use of the test harnesses and testing methods used by chip designers.

    The final product could be implemented in hardware or it could run as an interpreted progream with Spice (or similar) as the interpreter. The program should have the same number of bugs either way.

    To repeat, it is the combination of more primitive elements plus accesses to powerful test tools and test protocols that I think would provide better results.

    Even if I am wrong about primitive elements improving things per se, application of the testing tools and protocols of the chip industry alone may offer improvements.

    I do not believe that the end application such as CPU versus throttle controller versus space probe, have intrinsically more or fewer bugs. Instead, I focus on the programming and testing methods, tools and protocols. Should that be controversial? I don't see why. Surely you can think of some languages, methods, and test policies that produce better or worse results.
     
  23. Aug 9, 2014 #22

    D H

    User Avatar
    Staff Emeritus
    Science Advisor

    This is both spot on and incorrect at the same time.

    The incorrect part is that modern CPUs do have "lines of code". It's called microcode. The reason that Intel can keep releasing chips that use more or less the same instruction set they used decades ago is because Intel CPUs don't directly execute those machine language instructions. They instead execute microcode instructions. Testing whether the circuitry works as expected is a challenge, and Intel along with other CPU manufacturers go to great lengths to ensure that their circuitry works properly.

    Testing whether the microcode works as expected is a much tougher challenge. CPU manufacturers don't have this challenge completely ironed out. Because microcode lives in that fuzzy area between hardware and software, the microcode can be fortunately patched. Every once in a while, patches are indeed released by CPU manufacturers.


    This is your opinion. While you are entitled to your own opinions, you are not entitled to your own facts. You have missed several key points:
    • Why we have digital computers. It's so we can program them.
      This is the key reason that digital computers took over analog computers over 50 years ago. Analog computing mandated starting from scratch at the circuit level to solve a new problem. One can use the same digital computer to solve a wide range of problems.

    • Why testing at the chip level is very different from testing at the program level.
      Digital computers provide an instruction set, a vocabulary that lets us implement a von Neumann machine. The testing at the chip level tests whether the machine executes that finite instruction set works as expected. This finite instruction set provides the ability to write an infinite number of programs. Testing whether a program works as expected is a very, very different matter than is testing whether that finite vocabulary works as expected. amazing amount of power.

    • Why assembly language, high order languages, and even higher order tools exist.
      The reason is productivity. Whether programmers are writing in microcode, a machine language, a high order language, or an even higher order tool, an amazing constant takes hold: People can reliably write about one line of code per hour, regardless of language. (There are exceptions. For example, people have slapped together hundreds of lines of garbage per hour, but the typical result is the initial release of healthcare.gov.) Each of the steps from microcode to machine language, machine language to high order language, and high order language to higher order tool represents a factor of ten, or more, in terms of human productivity. Forgoing this is ludicrous.

    Your hypothesis would force us all to use microcode, and then the statement "I think there is a world market for maybe five computers" apocryphally attributed to Thomas Watson would indeed be true.


    This thread is closed.
     
Share this great discussion with others via Reddit, Google+, Twitter, or Facebook