When implementing a service you really have to have a good understanding of how Windows services work. If you do it wrong then your service won’t work properly, or worse, can cause problems in Windows. Services must be responsive and be a good citizen when working with the Service Control Manager (SCM). The .NET implementation hides a lot of these details but there is a hidden complexity under the hood that you must be aware of. But first a brief review of how Windows services work.
Windows Services Internals (Brief)
All services run under the context of the SCM. The SCM is responsible for managing the lifetime of a service. All interaction with a service must go through the SCM. The SCM must be thread safe since any number of processes may be interacting with a service at once. In order to ensure that a single service does not cause the entire system to grind to a halt the SCM manages each service on a separate thread. The exact internal details are not formally documented but we know that the SCM uses threads to work with each service.
Each service is in one of several different states such as started, paused or stopped. The SCM relies on the state of a service to determine what the service will and will not support. Since state changes can take a while most states have a corresponding pending state such as start pending or stop pending. The SCM expects a service to update its state as it runs. For example when the SCM tells a service to start the service is expected to move to the start pending state and, eventually, the started state. The SCM won’t wait forever for a service to respond. If a service does not transition fast enough then the SCM considers the service hung. To allow for longer state changes a service must periodically notify the SCM that it needs more time.
One particularly important state change is the stop request. When Windows shuts down the SCM sends a stop request to all services. Every service is expected to stop quickly. The SCM gives a (configurable) time for each service to stop before it is forcifully terminated. If it wasn’t for this behavior a hung or errant service could cause Windows shutdown to freeze.
A Day In the Life Of a Service
A service is normally a standard Windows process and hence has a WinMain. However a single process can host multiple services (many of the Windows services are this way) so WinMain itself is not the service entry point. Instead a service process must register the list of supported services and their entry points to the SCM via a call to StartServiceCtrlDispatcher. This method, which is a blocking call, hooks up the process to the SCM and doesn’t return until all listed services are stopped. The method takes the service name and its entry point (normally called ServiceMain). When the SCM needs to start a service it calls the entry point on a separate thread (hence each service gets its own in addition to the process). The entry point is required to call RegisterServiceCtrlHandlerEx to register a function that handles service requests (the control handler). It also must set the service state to start pending. Finally it should initialize the service and then exit. The thread will go away but the service will continue to run.
One caveat to the startup process is the fact that it must be quick. The SCM uses an internal lock to serialize startup. Therefore services cannot start at the same time and a long running service can stall the startup process. For this reason the general algorithm is to set the state to start pending, spawn a worker thread to do the real work and then set the service to running. Any other variant can slow the entire system down.
All future communication with the service will go through the control handler function. Each time the function is called (which can be on different threads) the service will generally change state. This will normally involve changing to the pending state, doing the necessary work and then setting the service to the new state. Note that in all cases the SCM expects the service to respond quickly.
In .NET the ServiceBase class hides most of the state details from a developer. To ensure that the service is a good citizen the .NET implementation hides all this behind a few virtual methods that handle start, stop, pause, etc. All a developer need do is implement each one. The base class handles setting the state to pending and to the final state while the virtual call is sandwiched in between. However the developer is still responsible for requesting additional time if needed. Even the registration process is handled by the framework. All a developer needs to do is call ServiceBase.Run and pass in the service(s) to host.
All is wonderful and easy in .NET land, or is it. If you read the documentation carefully you’ll see a statement that says the base implementation hides all the details of threading so you can just implement the state methods as needed but this is not entirely true. All the implementations except OnStart behave the same way. When the control handler is called it sets the service to the pending state, executes the corresponding virtual method asynchronously and returns. Hence the thread used to send the request is not the same thread that handles the request and ultimately sets the service state. This makes sense and meets the requirements of the SCM. More importantly it means the service can take as long as it needs to perform the request without negatively impacting the SCM.
The start request is markedly different. When the start request is received the base class moves the service to the start pending state, executes the OnStart virtual method asynchronously and then…waits for it to complete before moving the service to the start state. See the difference? The start request thread won’t actually return until OnStart completes. Why does the implementation bother to call the method asynchronously just to block waiting for it to complete? Perhaps the goal was to make all the methods behave symmetrically in terms of thread use. Perhaps the developers didn’t want the service to see the real SCM thread. Nevertheless it could have used a synchronous call and behaved the same way.
What does this mean for service developer? It means your OnStart method still needs to run very fast (create a thread and get out) even in the .NET implementation even though all the other control methods can be implemented without regard for the SCM. If OnStart takes too long then it’ll block the SCM. More importantly the OnStart method needs to periodically request additional time using RequestAdditionalTime to avoid the SCM thinking it is hung.
When implementing a service in .NET it is still important to understand how native services and the SCM work together. The OnStart method must be fast and well behaved to avoid causing problems with Windows. The other control methods are less restrictive but still require awareness of how the base implementation works. Writing a service is trivial as far as coding goes but services require a great deal of care in order to ensure they behave properly. This doesn’t even get into the more complex issues of security, installation, event logging and error handling which are broad topics unto themselves.