Intermittent CPU errors: Thank you, thank you, thank you

Click For Summary

Discussion Overview

The discussion revolves around intermittent CPU errors encountered in industrial computers, exploring the frustrations and challenges faced by professionals in the field of industrial automation and computer systems. Participants share personal experiences related to CPU failures, the complexities of troubleshooting, and the implications of such errors in their work environments.

Discussion Character

  • Exploratory
  • Technical explanation
  • Debate/contested
  • Meta-discussion

Main Points Raised

  • One participant describes their struggle with an intermittent CPU error in an older industrial computer, expressing relief at identifying the issue after considerable uncertainty about its cause.
  • Another participant humorously relates their frustration with computer issues, suggesting a desire to use a hammer as a repair tool.
  • A story is shared about a man committing 'computercide' after his computer malfunctioned, illustrating the emotional toll of such technical failures.
  • Concerns are raised about the challenges of being the last person in a long design process, where previous mistakes by engineers and budget managers complicate troubleshooting efforts.
  • One participant recounts a particularly difficult job involving a French computer system, highlighting the issues of component compatibility and the financial burden of troubleshooting failures.
  • Another participant mentions their role in handling network functions and contrasts it with the operational perspective of others, noting the shared frustrations in the field.
  • Discussion includes the high costs associated with downtime in industrial systems, with one participant mentioning a system that incurs $10,000 for every minute it is down.
  • Concerns about liability and insurance coverage for professional errors are raised, with participants sharing their experiences and the risks associated with their work environments.
  • A mention of a young boy who passed away from cancer adds a somber note to the discussion, connecting personal loss to the broader context of the conversation.

Areas of Agreement / Disagreement

Participants express shared frustrations regarding CPU errors and the complexities of troubleshooting in industrial settings. However, there is no consensus on the best approaches to managing these issues or the implications of liability and insurance, indicating multiple competing views and unresolved questions.

Contextual Notes

Participants discuss various assumptions related to their specific fields and experiences, including the reliability of different computer systems and the challenges of working with legacy technology. The discussion also touches on the emotional and financial stakes involved in their work, which may influence their perspectives.

Who May Find This Useful

Professionals in industrial automation, computer engineering, and network management may find this discussion relevant, particularly those dealing with troubleshooting and operational challenges in complex systems.

Ivan Seeking
Staff Emeritus
Science Advisor
Gold Member
Messages
8,252
Reaction score
2,664
It turns out that I have been fighting just this problem with an older industrial computer [industrial automation stuff]. Due to my lack of familiarity with this particular processor, I couldn’t be sure if I had configuration problems or some other software induced problems. I just observed a couple of faults that cinch the problem. THANK GOD! This was starting to get nasty. I am sure Zantra and others can appreciate the frustration and potential cost [to me] of problems like this. Consulting work can get real ugly at times.

It is so rare that a processor makes a genuine mistake, especially the stuff I work with, that it takes an act of congress to be sure that no other cause can be found. Industrial computers are very robust and the operating systems are rock solid. This is a very unusual event. Whewwww!
 
Last edited:
Computer science news on Phys.org
I can relate. Not just to CPU errors, but computer issues in general. Sometimes I just want to take my all purpose computer repair tool(hammer) and fix it for good
 
Originally posted by Zantra
I can relate. Not just to CPU errors, but computer issues in general. Sometimes I just want to take my all purpose computer repair tool(hammer) and fix it for good
I read a story of police responding to a home after getting a report of 'shots fired'. Upon arrival, a man opened the door holding a 12-gauge shot gun - the officers could see behind him that his computer had been shot to hell and back. He quietly admitted to having committed 'computercide'.
 
Originally posted by Zantra
In memory of Fruit_loops 1994-2004

:frown:
Did you lose your buddy?

As for computer problems and hammers [or shotguns!], ain't it the truth!

I'm not sure how your work differs from mine, but typically, unless I design the system, I am the last person in a long design process that involves many engineers...and even worse, budget managers! All of the design and engineering mistakes, along with any errors in component selection, the operational requirements or installation, and any mistakes made in the systems specifications are manifest when I begin to test my program. It is very difficult for me not to eat time for other people’s mistakes. When you throw equipment failures or flaws in the mix, it can get really, really ugly. Then, when I do finally prove that my work is fine and that other problems exist, no one wants to pay for the time. After 20 or 30 meetings and perhaps hundreds of phone conversations, it is often near to impossible to determine exactly who is liable.

I had a really ugly job once that used Modicon/Telemechanique - a French computer system. I am used to American, German, and Japanese systems, but this one takes the prize. After eating about a month's worth of work, and after paying for about 3 weeks of hotels, it finally came out that the components used had never talked with each other before. The main CPU and the field units did not talk the same language...exactly. I got to be the factory R&D man - in the field [actually 25 feet underground] and at my expense.

Self employment: When it’s good, it’s really good. When it’s bad, it’s really, really, really bad.

I have only seen one other intermittent CPU failure over the last ten years. In this case it, for about three days it appeared that I had personally [mostly] shut down the cement business for the entire northwest sector of the US. I spent one weekend in a state of absolute panic. The failure was costing no less than $10,000 an hour; 24 hours a day, everyday. I didn't understand what was happening, and the Port of Portland was breathing flames down my neck. At the time, I did not have the insurance to cover such an event. ,
 
Last edited:
Originally posted by Ivan Seeking
:frown:
Did you lose your buddy?

As for computer problems and hammers [or shotguns!], ain't it the truth!

I'm not sure how your work differs from mine, but typically, unless I design the system, I am the last person in a long design process that involves many engineers...and even worse, budget managers! All of the design and engineering mistakes, along with any errors in component selection, the operational requirements or installation, and any mistakes made in the systems specifications are manifest when I begin to test my program. It is very difficult for me not to eat time for other people’s mistakes. When you throw equipment failures or flaws in the mix, it can get really, really ugly. Then, when I do finally prove that my work is fine and that other problems exist, no one wants to pay for the time. After 20 or 30 meetings and perhaps hundreds of phone conversations, it is often near to impossible to determine exactly who is liable.

I had a really ugly job once that used Modicon/Telemechanique - a French computer system. I am used to American, German, and Japanese systems, but this one takes the prize. After eating about a month's worth of work, and after paying for about 3 weeks of hotels, it finally came out that the components used had never talked with each other before. The main CPU and the field units did not talk the same language...exactly. I got to be the factory R&D man - in the field [actually 25 feet underground] and at my expense.

Self employment: When it’s good, it’s really good. When it’s bad, it’s really, really, really bad.

I have only seen one other intermittent CPU failure over the last ten years. In this case it, for about three days it appeared that I had personally [mostly] shut down the cement business for the entire northwest sector of the US. I spent one weekend in a state of absolute panic. The failure was costing no less than $10,000 an hour; 24 hours a day, everyday. I didn't understand what was happening, and the Port of Portland was breathing flames down my neck. At the time, I did not have the insurance to cover such an event. ,

I think we're in slightly different fields. Mine is more of an operational perspective. I handle the network related functions, and LAN/WAN connectivity.

You build em, I fix em when they break:wink: But it's the same principle. I'm sure it's frustrating when the buck stops at you, and I must admit there's something to be said for the safety of coporate culpability. Of course that's only the pro...

Oh and fruit_loops was the young boy who recently passed away from cancer
 
Originally posted by Zantra
Oh and fruit_loops was the young boy who recently passed away from cancer

Ah. Tsunami was telling me about that thread but I never caught his name.
 
Oh and our company does have a system that requires 99.9 percent uptime, and it does cost the company $10,000 for every minute it's down. Luckily, that's not my jobd
 
Originally posted by Zantra
$10,000 for every minute it's down.

I don't think I could manage the insurance for that one. Technically though, the only thing specified as beyond the scope of my insurance are Nuclear Power Plants and Nuclear Weapons systems.

I don't think I have ever experienced that kind of liability – $10,000/minute. Usually, my biggest concerns are catastrophic failures and other sources of injury or death. I remember turning down a job for wire factory that was filled with incredibly dangerous open presses, cutters, and rollers; all real heavy duty, limb removing stuff. I swear this is true: I saw perhaps a dozen people with a missing body part and they have a one armed safety manager named Lefty. When I met Lefty, I left. This was one of the ugliest places I have ever seen.

How do you avoid or cover your liability for professional errors and omissions; or do you operate under your customers insurance?
 
self edited so I didn't give away any secrets I'm not supposed to.
 
Last edited:

Similar threads

Replies
8
Views
4K
  • · Replies 5 ·
Replies
5
Views
4K
  • · Replies 3 ·
Replies
3
Views
3K
  • · Replies 8 ·
Replies
8
Views
2K
  • · Replies 13 ·
Replies
13
Views
6K
  • · Replies 1 ·
Replies
1
Views
4K
  • Sticky
  • · Replies 13 ·
Replies
13
Views
9K
Replies
2
Views
3K
  • · Replies 10 ·
Replies
10
Views
4K