Determining where a STOP is coming from

  • Thread starter Thread starter newjerseyrunner
  • Start date Start date
Click For Summary

Discussion Overview

The discussion revolves around troubleshooting a recurring issue with a custom service on a server that unexpectedly stops due to a STOP error. Participants explore potential causes, diagnostic approaches, and the challenges of working with undocumented systems.

Discussion Character

  • Technical explanation
  • Debate/contested
  • Exploratory

Main Points Raised

  • The original poster (OP) suspects that the STOP error is triggered by an external process, as internal logging did not reveal any issues.
  • Some participants suggest that detailed information about the service and the operating system is necessary for effective troubleshooting.
  • One participant mentions that STOP codes are kernel mode errors and recommends checking for system updates and compatibility issues with the service.
  • Another participant proposes isolating the service in a virtual machine to monitor its behavior and suggests using a firewall and network analysis tools like WireShark to capture relevant data.
  • Concerns are raised about the lack of documentation and the difficulties of inheriting undocumented systems, with one participant advocating for a complete rewrite of the software as a potential solution.

Areas of Agreement / Disagreement

Participants express varying degrees of uncertainty regarding the cause of the STOP errors, with no consensus on a definitive solution or approach. Multiple competing views on troubleshooting strategies are presented.

Contextual Notes

Participants note limitations in the information provided by the OP, which may hinder effective diagnosis. There are also references to potential dependencies on specific operating system versions and the need for documentation.

Who May Find This Useful

This discussion may be useful for system administrators, IT professionals, and developers dealing with legacy systems or troubleshooting service-related issues on servers.

newjerseyrunner
Messages
1,532
Reaction score
637
I'm having this terrible problem with a server that I inherited. The service has a custom service running on it, and this service handles service commands.

Every hour or so on some of the servers, I see the service go down. It always comes from a STOP. I assumed there must be something wrong with the service, so I added logging to every place where a STOP is generated internally. I got nothing.

That indicates to me that it's coming from another process. Is there any way to determine which process it is? It's too random to try and just wait for it. I checked in the windows automation tools and the scheduler, and there are no tasks in there.
 
Computer science news on Phys.org
Ouch. That sounds ugly. Good luck.
 
newjerseyrunner said:
I'm having this terrible problem with a server that I inherited. The service has a custom service running on it, and this service handles service commands.

Every hour or so on some of the servers, I see the service go down. It always comes from a STOP. I assumed there must be something wrong with the service, so I added logging to every place where a STOP is generated internally. I got nothing.

That indicates to me that it's coming from another process. Is there any way to determine which process it is? It's too random to try and just wait for it. I checked in the windows automation tools and the scheduler, and there are no tasks in there.

For a helpful answer, details are needed about the service and the specific OS running on server. In any case you need at least the documentation about the service and the way that interacts with the specific OS. If you can't find that, Windows automation tools can be of no help.
 
All of those STOP codes are really kernel mode errors.

Homework time before you go nuts...

Did you try this? https://www.lifewire.com/blue-screen-error-codes-4065576
Technet also has articles on each of the specific error codes and what the potential cause is.

1. Are the servers that crash completely patched?
2. was the service in good function on a previous version of Windows? Like Server 2000 and it now plays with windows server 2012 or something
2a. and not recompiled on a newer version? i.e., just ported as an executable? Or or same old box? Or moved to a virtual (this killed us twice).
3. Do you have source or is it from a vendor? Can you get an updated version?
4. Are there parameter files or control files? Can you tell what resources they require? IO_WAIT_QUEUE lengths and other crud like that.
5. Get the sysinternals suite - see if you can determine the resources used - there are several tools, and this sounds like resource exhaustion without anything but a guess. Been there done that.

And, no, I believe it is trying to handle a request that requires more resources than are available to the process, or it simply is not freeing resources correctly. It is from the interaction with a client, the client is not the 'bad guy' necessarily. This is because the problem goes away when you restart the service process. For an hour or so.
 
  • Like
Likes   Reactions: FactChecker
The OP really didn't give us enough to go on, at all. There's a server, running a service, that in turn monitors a service on other servers, and that service being monitored is receiving a command to stop. Honestly, with that vague a description it could be anything.

I will give some vague but hopefully useful advice - Bring up a VM and only put the OS and the monitored service/software on it. See if the service gets stopped. Add a firewall and record all communication coming into and out of the server. If the service does stop and the firewall's logs don't help enough, add WireShark. In the meantime, pull the event logs from every server the service stops on - grab the minute before to the minute after. If you script, write a quick powershell script to let you know the status of the service and pull the event logs the moment the service status changes to stopped. Start looking for things in common in those event logs.
 
  • Like
Likes   Reactions: FactChecker
Inheriting undocumented stuff is always a nightmare.
I have refused to do a job at one time because there was not sufficient information concerning the faulting software.
I advised that a re-write from scratch would be the best idea, I didn't get the job
 
Last edited:

Similar threads

  • · Replies 15 ·
Replies
15
Views
3K
Replies
2
Views
2K
  • · Replies 3 ·
Replies
3
Views
4K
  • · Replies 3 ·
Replies
3
Views
4K
  • · Replies 15 ·
Replies
15
Views
2K
  • · Replies 25 ·
Replies
25
Views
5K
  • · Replies 2 ·
Replies
2
Views
3K
  • · Replies 12 ·
Replies
12
Views
5K
Replies
6
Views
4K
  • · Replies 30 ·
2
Replies
30
Views
4K