We have a customer with a scenario where a set of services run on SERVER1 -or- SERVER2. If any service in a set fails, everything in the set should be stopped and the entire set should be started on the other server. Below I’ll describe how to implement this.
In my example I’ll assume there are three services in each set, Service A, Service B and Service C. This pattern will scale to any number, though it it is admittedly a little tedious to setup initially.
The first step is to use the Start, Stop or Restart a Service action, and create an action to start Service A, Service B and Service C specifically on SERVER1 and SERVER2. That is 6 actions to be created. In addition we need to create 6 more that use the same action, but these are used to Stop the services. Note that normally the service could be started on the monitored server with much fewer actions, but in this case we want to stop and start the services on a different server, so this level of specificity is needed.
So three actions to start the service on SERVER 1, three actions to stop on SERVER 1, three actions to start on SERVER 2 and three actions to stop on SERVER 2.
To make these actions easier to deal with, I recommend adding the sets of three into an Action List action. So you’ll have four Action Lists:
- Stop services on SERVER 1
- Start services on SERVER 1
- Stop services on SERVER 2
- Start services on SERVER 2
With all of that done, we can now setup the monitors. First we’ll create a Service Monitor on SERVER 1 that is set to watch Service A, Service B and Service C. We’ll create a monitor like this for both SERVER 1 and SERVER 2.
Now click the Actions button. We want to make sure that when the monitor on SERVER 1 detects a down service, that it runs the following Action Lists:
Stop services on SERVER 1
Start services on SERVER 2
We only want the actions to get run once while in alert state (i.e. when the monitor is detecting the services are not running), so choose the shown radio button below:
On the monitor for SERVER 2, we’ll setup actions similarly, except when it fires actions it will run:
Stop services on SERVER 2
Start services on SERVER 1
Now we have what we want – when a watch service in a set on SERVER 1 stops, all services in the set will be stopped, and the set of services will be started on SERVER 2. And vice versa.