softwarenotperfect

Why Your Software is Never Perfect

[Total: 5    Average: 5/5]

We occasionally have students ask for help on software, “My software is perfect, but it doesn’t work!” Your software is never perfect. My own software is never perfect.

I recently found that I made someone’s top ten list with regard to software that I had written 37 years ago. It’s not a top ten list anyone would aspire to be listed on. Software that I wrote in 1979 is number two on this list of ten historical software bugs with extreme consequences. I learned some very important lessons from that experience.

Background.

My first job out of college was to document what everyone thought was a very complete and very well tested set of software for processing low level data solar backscatter ultraviolet / total ozone mapping system (SBUV/TOMS) on the NASA’s Nimbus 7 satellite. The entire development had already moved on to other projects with different employers. They left behind a large set of code and a very tall stack of computer printouts that contained their test results.

I started from the state of “What is this ‘FORTRAN’ language?” but quickly proceeded to “How can this code possibly work?” and from there to “There’s no way this code can possibly work!” I finally looked at that massive stack of test results on reaching that final stage of understanding. I was the first to do so except for the developers who had abandoned ship. Nobody else had looked at those tests results. They instead looked at the amazing thickness of the printouts.

Testing by thickness always has been and always will be a phenomenally bad idea. Some of those test printouts were slim; these were failed compilations. The rest were what were then called “ABEND dumps.” In those days, the equivalent of what is now called a segmentation fault resulted in the entire virtual memory for the process in question being printed out in hexadecimal. The result was a huge waste of paper. (The modern equivalent is a segfault and core dump.) Not one test indicated success.

This turned out to be a career maker for me. I made a name for myself by fixing that mess. As a result, I was subsequently given the privilege of working directly for the principal investigator of that pair of instruments and his team of scientists. Instead of the low-level computer science stuff involved with my first task, my next task truly did involve scientific programming.

Why the Nimbus 7 satellite did not discover the ozone hole.

Of the two ozone measuring instruments on the Nimbus 7 satellite, one (SBUV) had been flown previously, but the more precise instrument (TOMS) was brand new. The previously flown instrument sometimes yielded flaky results when the solar angle was low, and the team scientists were worried that the same would apply to this newer instrument. The scientific team did not want their good scientific names sullied by suspect data. As a result, they vehemently insisted that I filter out those suspect results by resetting data where the solar incidence angle was low and where the estimated ozone quantity lay outside a predetermined range to a value that meant “missing or invalid data”.

I argued that if I did what he asked that there would be no way to discover anomalies. “We can change your code if we discover anomalies.” I suggested that I produce two products, an unfiltered one for NASA internal use only and a filtered version for release to the wider research community. They did not want any part of that, either, on the basis that the unfiltered version would somehow get outside of NASA. “Do what I told you to do, or we will tell your employer to assign someone else to us.” I capitulated and did what he told me to do.

Karma!

The Nimbus 7 satellite did not discover the ozone hole. Credit for that discovery instead goes to Joseph Farnam, who simply pointed a device invented in the 1920s (a Dobsonmeter) up into the sky. Mr. Farnam received a very nice obituary in the New York Times. The SBUV/TOMS team will more or less die anonymously. That’s karma.

As I should have been more insistent with the scientific team, I too was stricken with karma. In 1986, curious minds at NASA wanted very much to know why their very expensive satellite did not discover what a person using a 1920s era device had discovered. The scientific team discovered that my name was all over the code that hid the ozone hole from NASA. (They conveniently forget why this was the case.) This made people high up in NASA want to talk to me, personally. Despite having switched employers three times and despite having moved 1400+ miles away from that initial job, I received numerous phone calls and even a few visits from people very high up in NASA that year. I told them why that code existed, and also how to fix it. Voila! After reprocessing the archived satellite data, the Antarctic springtime ozone hole appeared every year.

What I learned.

  • Lesson number one: Take responsibility for your code.
    Version control software provides the ability to establish blame (or credit) for who wrote / modified each and every line of code in the code base. Your name is the sole name attached to the code you write, not your boss’s name, nor that of your customer. You never know who’s going to come back to you seven years or more after the fact regarding code that you wrote. It’s your name that will be on the code, so take full responsibility for it. While I took full responsibility for fixing that very bad code right out of college, I did not take full responsibility for the code I wrote immediately afterwards. I should have.
  • Lesson number two: Your code is never perfect.
    As I noted at the outset, this site occasionally receives posts that start with “My code is perfect, but it doesn’t work right! Help me!” If your code doesn’t work right, it is not perfect, by definition. Typical code has a bug per one hundred lines. Well-crafted, well-inspected, and well-tested code typically has one bug per one thousand lines, or perhaps one bug per every ten thousand lines if done very carefully. Pushing beyond that one bug per a few thousands of lines of code is very hard and very expensive. The Space Shuttle flight software reportedly had less than one bug per two hundred thousand lines of code. This incredibly low error rate was achieved at the cost of writing code at the glacial rate of two lines of code per person per week, after taking into account the hours people spent writing and reviewing requirements, writing and reviewing test code, writing and reviewing the test results, and attending meeting after boring meeting. Even with all that, the Space Shuttle flight software was not perfect. It was however as close to perfection as code can possibly be. (Note: I did not participate in that process. It would have killed me.)
  • Lesson number three: Even if your code is perfect, it is not perfect.
    This is the difference between verification and validation. Verification asks whether the code does exactly what the requirements or user stories say that the code should do. There’s a hidden bug just waiting to manifest itself if the tests are incomplete (and the tests always are incomplete). While verification is hard, validation is even harder yet. Validation asks whether the requirements / user stories are correct. This is not something that typically can be automated. In the case of Nimbus 7, there was a faulty requirement to filter out suspect data. Because I initially balked at writing the code to implement this, there was an explicit test, written by me and reviewed by others, that ensured that the code filtered out those suspect values. Faulty requirements result not only in faulty code, but also in faulty tests that prove that the code behaves faultily, just as required.

 

 

52 replies
Newer Comments »
  1. jedishrfu
    jedishrfu says:

    Wonderful article! It brings back memories of working with GE and crashing the system while running in master mode with some software that was never designed to run in that environment. Guess who got the credit/blame for the crash. Even after it was explain to the technical staff one customer service rep pipes up at the end so it was xxxx who crashed the system right? The crash error actually had a bright side in it illustrated how another service was crashing things and we found the bug there too. But still the one who crashed it lives on…

  2. Borg
    Borg says:

    Nice article DH. One of the biggest lessons that I had to learn as a junior developer was that complexity did not mean that previous developers knew what they were doing. My first reaction would be that they must be really good programmers if they could write code that was difficult to follow. I would be afraid to make changes for fear of breaking something in all that complexity. These days, simplicity is my goal and I have no problem taking a chain saw to overly complex code as I’m doing on my current project. The link in my signature is my mantra for what not to do when writing software.

  3. Pepper Mint
    Pepper Mint says:

    I share the same thoughts with Borg. I realize the best part in coding or software development in general is to simplify things as much as possible but they must be guaranteed to have their basic provided functions preserved at the same time. This also helps to reduce costs for business development (e.g maintenance) and resource management (e.g more experts are no longer needed ), etc.

  4. QuantumQuest
    QuantumQuest says:

    Really nice article. I share same thoughts with Borg too. What I tried to do for myself to improve my skills from the outset, was to be pedantic enough on testing and especially proper commenting and documentation in the web world, where I began my professional coding endeavor. Back in that era there was a lot of improvisation, catchy things to include and proprietary things to follow in order to survive. Fortunately this “web 1” era has long gone. After a specific point , I jumped onto the bandwagon of enterprise software – mainly Java, but I kept on taking some medium web projects sometimes just for the fun of it. I found many times poorly documented code and ad-hoc solutions, courtesy of top developers. This does not help anyone. When such developers leave a company, their leftovers is just mess. Surviving these things, although part of the process, is something I personally try to avoid recreating. There is always room to write good code that can be followed by others. I also vote for simplicity provided that fits sufficiently to the project at hand. I definitely agree to the three final points that are made in this insight. Especially to the fact that there hasn’t been and can never exist perfect code. There are bags that will survive almost every testing process and manifest themselves at the wrong time. If we talk about real time software this can be fatal. Continuous testing and adaptations including modifications must me done in a regular basis.

  5. eltodesukane
    eltodesukane says:

    "It’s your name that will be on the code, so take full responsibility for it."Good advice, but usually this can not be done.How many times does a programmer says the code is not ready, but the employer says we release it now anyway?Same problem in design, architecture or else..If an architect is hired to replace a wonderful bay windows with a concrete wall, then the architect will do that.If this concrete wall is viewed as an ugly abomination, the decision maker has to be blamed, not the architect or designer hired to do it.

  6. Hornbein
    Hornbein says:

    I’ve been a pro programmer but my main interest is music. The standards are completely different. In music your stuff has to be pretty close to perfect if you want to make a living. In software, a total incompetent with credentials can make a living. The demand for programmers is so high that you can get away with anything. In music, demand is so low that you can be very talented and lose money.

    If the situation were reversed, software WOULD be perfect. You’d starve if it weren’t.

    Steve Morse won a ton of Best Guitarist polls. Keyboardist T Lavitz said of guitarist Steve Morse that in five years of rehearsing and performing very difficult music he never heard Steve make a mistake. Not once. Nevertheless he couldn’t make it as a solo act. Classical music is even more demanding, to a degree that’s almost inconceivable. If programming were like that, you’d have to start at age three then do it five hours a day for the rest of your life in order to have a chance to make it. And not a very good chance at that.

    Guitarist Mick Goodrick advised people to stay out of music if they could do anything else. Indeed, those who make it a career generally can’t do anything else. I think there is no space left over in the brain for anything else. It’s too demanding.

    When I grew up I found that many of those famous jazz musicians like Barney Kessell really made their money playing for TV commercials and stuff like that. It’s kept secret because it’s depressing. Entertainers can’t be depressing.

    Yes, I know, lots of pop stars have little musical ability. That’s different. Pop stardom has almost nothing to do with music.

  7. Borg
    Borg says:

    ”It’s your name that will be on the code, so take full responsibility for it.”
    Good advice, but usually this can not be done.
    How many times does a programmer says the code is not ready, but the employer says we release it now anyway?

    In my experience, there is rarely a case of not being able to do the job within the alloted time. If a developer gets sidetracked by other priorities, then the deadline is extended or someone else picks up the slack. Quite often, when I hear this excuse, it’s from someone who isn’t doing their job correctly – either they’re goofing off and not doing their job or they’re stuck and are too afraid (or proud) to ask for help. I have no pity for the goof-offs and the prideful can be their own worst enemy. For everyone else, a simple five minute discussion of how to tackle a problem can make all the difference. I have no problem asking a junior developer how something works if I think that he has a better insight into it.

  8. anorlunda
    anorlunda says:

    Wow. that was a fun read.@D-H. That’s one Insights article that is really insightful.

    Life isn’t fair. Most developers are forced to follow orders and meet the requirements handed down, as you were. More fortunate developers, are ahead of the curve. They create the future, then show users what they really wanted, contrary to what they asked for. Perhaps the most famous example of that was Steve Jobs and his team with the iPhone. But an even better example was Dan Bricklin and Bob Frankston with Visicalc.

  9. rcgldr
    rcgldr says:

    In my experience, there is rarely a case of not being able to do the job within the allotted time.

    Who’s setting this allotted time? The two main issues I’ve seen are overly optimistic schedules set by management, or unexpected technology issues, usually related to hardware.

    In DH’s example, the issue wasn’t software bugs, as the software was doing what it was asked to do, which was to mask certain types of anomalies.

    I was spoiled by my first job, back in 1973. It was a multi-processor / multi-tasking online database system for pharmacies (mostly prescriptions in the database, with support for insurance billing features). The system never experienced data corruption. There were instances of hardware failures that resulted in temporary down time, but the backup / recovery procedures developed using a test system, worked the first time they were used in an actual failure occurrence. This was a near mission critical environment. At the other extreme, I’ve worked in environments where a project was thrown together just to release something, and most of the software effort was spent tracking down and fixing bugs. In a few rare cases, there was a dual effort, the quick and dirty approach just to release something (like a beta release), in parallel with a proper development cycle to produce code that would replace the quick and dirty release.

  10. rootone
    rootone says:

    There was a time when I made most of my living from freelance programming.
    On one occasion I was tasked with putting right a buggy application when the original coder had departed and was seemly uncontactable,
    furthermore there was little in the way of any documentation other than a description of what the system was supposed to achieve.
    I told the client that it looked like there would need to be at minimum a couple weeks just going through the code, testing things and making notes.
    Client was not happy to be told that and said they would get somebody else to do the job.
    In the end I think what happened is the whole thing got rewritten from scratch by somebody.

Newer Comments »

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply