Lessons From the Millennium Bug

MartynThomas · Dec 16, 2019

Twenty years ago, we were waiting to see whether the billions of dollars and thousands of person years spent finding and correcting the Y2K problem (aka Millennium Bug) had been successful.

To remind you: there were three major problems (and a few minor ones - for full details see the paper referenced later).

The use of two-digit years in data, which had been going on for decades and was extremely widespread. Dates in the 21st century would have years that were numerically lower than dates in the 20th century, which would lead to incorrect sorting of dates and incorrect calculations of elapsed periods. For example, food that had use-before dates in 2000 would seem to be decades old already and credit cards and security passed that expired in 2000 or later would seem decades out of date already. The first such failures occurred in the late 1980s but the extent of the problem only became widely recognised in the 1990s.
Several BIOSs in widespread use failed if the current date was entered as 2000.
PCs and other devices that were used in process control or automation also failed when processing dates in 2000 or later. (for example, lifts (elevators) that checked their maintenance status would fail once the date for next maintenance crossed the century boundary).

Fixing these problems was a massive task and it became a main board-level issue when company auditors started warning companies that without evidence of a successful Y2K remediation project hey would refuse to sign off audits on a "continuing business" basis.

In the run-up to Y2K, one of the fixes was "windowing", where two digit
years below 20 (for example) were treated as in the 21st century and
years above 20 were in the 20th. There was some speculation that there
might be problems when the end of the window arrived.

This might just be an example:
https://nakedsecurity.sophos.com/20...ers-should-update-now-to-dodge-y2k-style-bug/
even though the product was written after 2000, if a pre-existing library was used. If it is a Y2K end-of-window problem, there may be similar problems about to appear.

People who weren't involved in solving the Y2K problem seem to think is
wasn't a serious issue. But it was and despite the billions of dollars
spent and the tens of thousands of people who worked to identify and fix
the problems, 15 nuclear reactors shut down on January 1st 2000. I led
the Y2K service line for Deloitte Consulting for a few years and I have
described what really happened. See:
https://s3-eu-west-1.amazonaws.com/...binary/2773/2017-04-04-MartynThomas_Y2K-T.pdf

Many of us who were intensively involved in Y2K projects are angry that our success is seen as evidence that there wasn't a problem. There was a serious threat and it these lessons for us today:

Having many seemingly independent systems all vulnerable to a single event can be catastrophic. Yet today, we have much of the economy dependent on a GPS signal that is trivially easy to jam and increasingly easy to spoof. (The signal, on reception, is weaker than thhe thermal noise in the receiver's electronics). We also have software monocultures, so that a fast-spreading malware can destroy many systems very quickly (NotPetya was a good example, but imgine if that had used a zero-day vulnerability instead of one that was already widely patched).
The Y2K failures that did occur didn't propagate because systems were not then tightly coupled. Today's just-in-time supply chains are much more vulnerable to cascade failures.
In the 1990s, I asked leading companies how confident they were that they could rebuild their key IT systems from souce code successfully. It turned out that basic software engineering standards were poor. Today, software development still looks more like a craft activity than an engineering discipline, (for example, 60 years after Edsger Dijkstra pointed out that testing can only show the presence of errors, not the absence, companies still rely almost exclusively on testing software rather than reasoning about it, and few companies require the use of strongly-typed languages that would avoid a wide range of programming errors).

So next time someone refers to the Y2K problem as a a scare story or a scam, please tell them they are wrong and that it is time the lessons were learnt.

fresh_42 · Dec 16, 2019

MartynThomas said:

So next time someone refers to the Y2K problem as a a scare story or a scam, please tell them they are wrong and that it is time the lessons were learnt.

Very nice note, thanks. Btw, the introduction of the Euro 1998 was a similar nightmare, although restricted to monetary flows. People only saw the conversion factor of a single currency, as if we had just to multiply outputs. That it had been close to the millenial change had the advantage that programs only needed to be inspected once. And I saw assembler routines from the early 80's which had to be changed as well as COBOL spaghetti code over dozens of pages. In Germany IIRC they also changed the normalization of stocks in that night: from absolute values to relative notation. At least I never again saw the new year fireworks from above!

berkeman · Dec 16, 2019

Welcome to the PF, @MartynThomas

I still remember the Y2K preparation and that tense overnight heading into that New Year's Day. I was a Team Leader on the community Emergency Response Team in my town, and a leader of our HAM radio community. We had a number of joint briefings at the Police Department leading up to that overnight, with lots of city contingency planning (like if the 9-1-1 system went down) and pre-positioning of resources. The Fire Department even had all their engines and trucks out that night, doing "windshield surveys" watching for issues.

In the end, it was all Code-4 (no trouble found), and we all breathed a sigh of relief. Pretty surreal experience overall though...

anorlunda · Dec 16, 2019

I too worked on Y2K remediation.

Many of our clients had armies of employees with personal computers, and there the ugliness became apparent. Employees of that age (myself included) when presented with an upgraded PC began by copying all files and programs from the old computer. Over time, each PC became like an onion, with remnants of 2, 3, 4, or 5 earlier PCs on the disc. Of course, few or none of those older programs were really used, but how could the corporation know that? In 1999 when it came time to inspect all company owned computers for Y2K vulnerabilities, it became a nightmare. The only way out was to scrub all the discs clean, wiping out all employees favorite software and personal files, and making a clean install of known software.

I participated in some of those, wiping up to 375,000 PCs per weekend day. It made me cringe because I imagined 375,000 angry cries of anguish from those employees on Monday morning. Good thing they did not know my identity.

You're correct, that the world saw our success as evidence that the problem never existed. Too bad; we can't control that.

Ironically, I did achieve a degree of satisfaction in the days after 9/11/2001. Prior to Y2K, many or most backup/failover systems and procedures were garbage. They were Rube Goldberg lashups that failed so many tests that testing was halted. Of course, those weaknesses were never publicly acknowledged. I suffered personally from that when a 1987 (88?) fire in a Los Angeles money wire transfer office burned up a transfer with my life savings. Their backup didn't work.

Those and other widespread IT sins were exposed and remediated in the years leading up to Y2K. Y2K provided the motivation, money, and political cover to do a long needed clean-up.

20 months after Y2K on 911 the heart of America's financial companies were wiped out in the World Trade Center. The reports I heard after the fact was that the off-site disaster recovery centers of all those companies worked flawlessly, and interruptions to critical financial services lasted only a few seconds. I'm confident that if 911 had occurred in 1997 rather than 2001 the outcome would have been very different.

But I did not document the citations those things as carefully as the OP. I can't provide a bibliography of my sources. So, thanks or sharing @MartynThomas.

MartynThomas · Dec 16, 2019

Yes, it was a strange night. I had been providing assurance of the Y2K programme at NATS (the UK air traffic services provider). NATS set up an action room over that night and late on 32st Dec the Scottish Control Centre called to say they thought their radars had failed as they were getting no returns.
The radars were fine. There were no aircraft flying.

At 4am on January 1st 2000, the systems that measured runway visibility ("runway visual range" RVR) failed simultaneously on all NATS airfields. We had missed one error in the clock updating and when the RVR systems synchronised with the master system they found a discrepancy and shut down (as they had been programmed to do if a fault was detected). The problem was cleared with a reboot and caused no safety issues as no-one was flying anyway.

MartynThomas · Dec 16, 2019

In the mid 1990s, I walked out of a vacuuous meeting of Deloitte Consulting Group management consultants and sat by the hotel lake wondering why I had wasted my time attending.

I wrote this doggerel in an attempt to capture the scorn I felt.

Song of the Year 2000

"What did you do in the Great War, Daddy? When the chips were down  and fortunes made and lost?"

"I 'Re-engineered the Enterprise', with strategies, and leadership, and SAP and never mind the cost!"

"But what did you do when  computers started saying  that the date was 1900  and the databases wrong?"

"I took a bear position  as the markets started falling,  and acquired some assets going for a song."

"So were you with the heroes  as they fought to change the systems,  so the hospitals kept working  and the people wouldn't die?"

"I had to strive for synergy,  and leverage and profit— and besides, the litigation risk  could blow us to the sky!"

"Then what were your priorities  when businesses were failing?  When the systems you had sold them, stopped  and brought them to their knees?"

"I passed them to a partner skilled  in corporate recovery,  who asked for bank securities  to guarantee his fees."

"But Daddy, when the programmers were  working nights and weekends,  to find and fix the problems and  to think out what to do, you were working there beside them, weren't you?  Rescuing the clients? I have told my friends at school I know  that you were fighting too!"

"My child, you must remember, I'm  a Management Consultant!  My time is too important to  be spent on such affairs.

I leave such work to engineers —their time is less expensive.  Now, clean your teeth and go to bed —and mind you say your prayers!"

PeroK · Dec 16, 2019

Mine was pithier:

Y2K or not Y2K that is the question.

Or, one that didn't go down very well at the time:

How do you know a potato is Y2K compliant?

PeroK · Dec 16, 2019

This thread jogged my memory of a system that wasn't midnight compliant! I kid you not. We had to shut it down at 11:55 every evening and start it up again at 12:05 am!

pbuk · Dec 16, 2019

Well done on the Y2K bug, can you fix cross-platform character encoding incompatibilities now

?

DaveC426913 · Dec 16, 2019

Well I've learned my lesson.
I moved to the 5-digit year back in 02001 to avoid the Y10K bug.
Not getting caught with my pants down again.

jbriggs444 · Dec 16, 2019

DaveC426913 said:

Well I've learned my lesson.
I moved to the 5-digit year back in 02001 to avoid the Y10K bug.
Not getting caught with my pants down again.

Not until Y100K anyway.

pbuk · Dec 16, 2019

Of course there are those who didn't learn first time round, including NASA, and there is still work to do before the Y2038 problem crystallizes. When it does there will be a lot more devices with embedded systems (possibly more than the human population of the planet) than there were in 2000.

Tom.G · Dec 16, 2019

I was a hired gun in the middle of a 3 year Industrial Control project at the time. The control computers were already y2k qualified, so no worries.

One of the other businesses in the building was a Server Farm for Credit Card processing. They had been considering getting backup power generators in case the grid went down.

It was less than a week before Zero-Hour when one of their engineers asked me if generators were a good idea. I responded with a 'Probably Not'. Here is the reasoning: You MAY be able to find some generators, but what about electricians to install them. Assuming installation was possible in the remaining time, I asked "How are you going to power the Air Conditioners?", which were supplied by the building infrastructure.

The Result: No generators. No power outages. No computer crashes. Now everyone could clear the adrenaline from their systems!

Cheers,
Tom

berkeman · Dec 17, 2019

Tom.G said:

I asked "How are you going to power the Air Conditioners?"

Doh! We didn't think of that!

DEvens · Dec 17, 2019

In 2002 I spent most of the year recovering the contents of an analysis performed by another engineer in the mid 1990s.

The computer he had done the work on was literally destroyed and carried to the dump. This was because it was decided it was too complicated and expensive to Y2K qualify it. And the contents of the hard drives and backup tapes were considered proprietary. So, since it would have been difficult to transfer this content to another system, and the other system would not have been binary compatible with the binary executable files, the entire system has hard-erased (the terms used were "percussive erasure" and "combustive erasure") and trashed. Hard drives, backup tapes, and even most of the paper records, irretrievably gone.

I was left with a few hard copies of some of his input files. None of his source code or binaries. And none of his notes or records. In some cases, I was not even clear on what computer language he had used to do the analysis. Though it was mostly FORTRAN.

The engineer I was tracing the steps of was dead. So, short of contracting Tangina Barrons from the movie "Poltergeist", I was not going to get any help from him.

So, pardon me if when you say "Y2K" my eye twitches a little.

fresh_42 · Dec 17, 2019

DEvens said:

The engineer I was tracing the steps of was dead. So, short of contracting Tangina Barrons from the movie "Poltergeist", I was not going to get any help from him.

Sounds as if a séance would have been more promising.

sysprog · Dec 20, 2019

Me (consultant): This date conversion code is not Y2K compliant.
Boss: What's wrong with it?
Me: It checks the year for divisibility by 100, and for divisibility by 4, but not for divisibility by 400.

I was in fact wrong about that -- I was scrambling to get things done quickly, and I had hastily believed that a comment that said "not a leap year" when the year was divisible by 100 meant that the code wasn't testing for divisibility by 400. Later I realized that if the 4 digit year was divisible by 100 (which it would be if and only if the year ended in 00), then after a Load and Test Register instruction following the Divide Register by 100 instruction, a conditional Branch directly to the test for divisibility by 4 would not be taken, and a Shift Right Double Logical instruction would first be executed that made the subsequent test for divisibility by 4 reference the 2 century digits instead of the 2 year digits in the 4 digit year. Clearly any integer divisible by 100 is divisible by 4, and if the century of a divisible-by-100 year is divisible by 4, then the year is divisible by 400. It was a standard technique.

From https://www.rosettacode.org/wiki/Leap_year#360_Assembly:

Code:

LPCK CSECT                                                         
     USING LPCK,15                                                 
     STM  0,12,20(13)   STORE CALLER REGS                          
     LM   1,2,0(1)      R1 -> CCYY, R2 -> DOUBLE-WORD WORK AREA    
     PACK 0(8,2),0(4,1) PACK CCYY INTO WORK AREA                   
     CVB  0,0(2)        CONVERT TO BINARY (R0 = CCYY)              
     SRDL 0,32          R0|R1 = CCYY                               
     LA   2,100         R2 = 100                                   
     DR   0,2           DIVIDE CCYY BY 100: R0 = YY, R1 = CC            
     LTR  0,0           YY = 0? IF CCYY DIV BY 100, LY IFF DIV BY 400                                        
     BZ   A               YES: R0|R1 = CC; CCYY DIV BY 100, TEST CC                           
     SRDL 0,32            NO: R0|R1 = YY; CCYY NOT DIV BY 100, TEST YY                           
A    LA   2,4           DIVISOR = 4; DIVIDEND = YY, OR DIV BY 100 CC                                    
     DR   0,2           DIVIDE BY 4: R0 = REMAINDER, R1 = QUOTIENT 
     LR   15,0          LOAD REMAINDER: IF 0, THEN LEAP YEAR       
     LM   0,12,20(13)   RESTORE REGS                               
     BR   14                                                       
     END

Boss: What's your solution?
Me: Insert a couple of lines to check for divisibility by 400.
Veteran In-house Programmer: Why not just take out the test for divisibility by 100, and I'll take full responsibility when the code fails in 2100.

DaveC426913 · Dec 20, 2019

I didn't quite follow the logic (run-on sentence! :oldgrumpy:

) but are you aware that it is a leap year when it's divisible by 400? So, yes you must account for divisible by 400 - as an exception to the exception.

Your boss is condoning the same solution that got us into this mess:

Programmer in 1975: "This code will fail when the year hits 2000."

Manager: "It's OK - that's ridiculously far into the future. Our programs will all be replaced loooong before then. Especially the banks. Banks hate legacy software. We sure won't still have programs in {BASIC, FORTRAN, COBOL, ASSEMBLER } will we? Haha."

Programmer: Heh heh. I know right? Can you imagine us still programming in one of those languages while commuting in our flying cars and personal jet packs?"

sysprog · Dec 20, 2019

DaveC426913 said:

I didn't quite follow the logic (run-on sentence! ) but are you aware that it is a leap year when it's divisible by 400? So, yes you must account for divisible by 400 - as an exception to the exception.

Your boss is condoning the same solution that got us into this mess:

Programmer in 1975: "This code will fail when the year hits 2000."

Manager: "It's OK - that's ridiculously far into the future. Our programs will all be replaced loooong before then. Especially the banks. Banks hate legacy software. We sure won't still have programs in {BASIC, FORTRAN, COBOL, ASSEMBLER } will we? Haha."

Programmer: Heh heh. I know right? Can you imagine us still programming in one of those languages while commuting in our flying cars and personal jet packs?"

That boss didn't condone that solution, but he conceded that it was funnier than mine. He had the foresight to be already concerned regarding mid-21st century clock problems.

darth boozer · Dec 22, 2019

There was also a lot of money made by unscrupulous "testers". I saw items in hospitals that had no form of clock at all, such as desk lamps and simple lightboxes containing only fluorescent tubes for examining X-ray pictures, being "tested" and "certified" as Y2X compliant at a charge of NZ$38 per item.

DaveC426913 · Dec 22, 2019

darth boozer said:

There was also a lot of money made by unscrupulous "testers". I saw items in hospitals that had no form of clock at all, such as desk lamps and simple lightboxes containing only fluorescent tubes for examining X-ray pictures, being "tested" and "certified" as Y2X compliant at a charge of NZ$38 per item.

I wouldn't be quite so quick to blame the testers themselves.

It is conceivable that the decision was made at a higher level - and that the decision was likely that "all devices with electronics must be certified". Not "just the obvious ones".

Consider the alternative:

Senior Manager is given a mandate to ensure hospital is 100% compliant. Decides for himself (not being an electrical engineer) that devices X, Y and Z "obviously" don't have clocks, so let's skip testing them. Senior manager has been explicitly derelict in his duty to fulfill the mandate, loses his job. Worst case: leaves himself open to litigation.

fresh_42 · Dec 22, 2019

DaveC426913 said:

Consider the alternative: ...

... and the fact that there likely had been an audit afterwards.

The lamps on runway lightnings are exchanged after a certain timespan of usage, regardless they are broken or not. It is at least thinkable, that some lights count this timespan, and it might be coupled to the calendar as well. This is probably not the case, but do you want a consultant to decide this on his own?

pbuk · Dec 31, 2019

There was an article on this in todays print edition of the Telegraph (a newspaper in the UK). It slightly misses the mark with a focus on CPUs rather than data structures, but gets the level of concern about right IMHO.

anorlunda · Dec 31, 2019

The difficult problem of dealing with time is never going to go away.

Consider local time. Assume that you have a UTC clock, can you write an algorithm that calculates the local time anywhere on Earth, for any time past/present/future? It is as much a political/social problem as technical. It can not be neglected, there are many cases where machines must act based on local time rather than UTC.

Suppose you charge people by the hour and give them daily invoices. How many hours are in a day? That depends on daylight savings time, which depends on locality. The answer may also be different for mobile apps that cross time zones. How many seconds in a day? That depends on leap seconds.

Keeping track of time sounds so elementary and simple, but it is not.

DaveC426913 · Dec 31, 2019

anorlunda said:

The difficult problem of dealing with time is never going to go away.

Consider local time. Assume that you have a UTC clock, can you write an algorithm that calculates the local time anywhere on Earth, for any time past/present/future? It is as much a political/social problem as technical. It can not be neglected, there are many cases where machines must act based on local time rather than UTC.

Suppose you charge people by the hour and give them daily invoices. How many hours are in a day? That depends on daylight savings time, which depends on locality. The answer may also be different for mobile apps that cross time zones. How many seconds in a day? That depends on leap seconds.

Keeping track of time sounds so elementary and simple, but it is not.

Yep. Time in software is always a headache, and it doesn't have to be a time-zone-spanning app to bring on that headache.

Of course, one solution is to not have the app itself deal with it - it just works in UTC. The device, which is a physical, localized object that can't be in two places (or times) at once, can have its own localization helper.

That doens't make the problem go away; it simply logically componentizes the problem.

fresh_42 · Dec 31, 2019

DaveC426913 said:

Of course, one solution is to not have the app itself deal with it - it just works in UTC.

Air traffic works with UTC, Russian train stations and schedules with UTC+3. Whenever a non local time is necessary, people already use UTC.

Lessons From the Millennium Bug

Is A.I. more than the sum of its parts?

AI vs. Humans as Processors in an Environment

France to ditch Windows for Linux

Sweetspot of data compression

Other than just FizzBuzz to test programmer candidates

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Lessons From the Millennium Bug

Similar threads