I'm trying to improve our monitoring by utilizing some advanced features of Nagios.
In this case, I have 4 nodes that have monitored processes running. Generally, it would be a critical issue if any of those services stopped running, UNLESS a file exists on a specific host.
I believe this can be addressed by a service dependency if I create a service that checks if that files exists. The problem is, I'm not sure how to configure the services to work.
Environment:
Nagios: 4.3.2
Hosts: A, B, C, D
Processes: A: proc1, proc2, B: proc3, C: proc4, D: proc5
Maintenance file: A:/tmp/MAINT
Logic:
Processes: If process x is not running on host y, create a CRITICAL alert UNLESS file A:/tmp/MAINT exists.
MAINT file: If /tmp/MAINT exists, generate a warning as the normal condition is that we're NOT in maintenance.
Each of the processes has a service definition which is assigned to the proper hosts. These work fine at generating critical alarms as it stands...they're just now mostly false alarms because the MAINT file exists but isn't being factored in. My first swag at the dependency was:
Code: Select all
define servicedependency {
host_name HostA
service_description Check for MAINT file
dependent_service_description proc1
execution_failure_criteria o ; do not check dependent service if we're OK
notification_failure_criteria o ; Notify only if no CLUSTER_DOWN file (normal condition)
}The way I read the info on SD, execution _failure_criteria represents the state of the dependent service (the process checks on the hosts), so if all the processes are running and generating OK, then we don't have to check for the existence of the MAINT file. If we're NOT OK, then check the dependent service and only THEN decide of the failed dependent process needs to generate alerts.
Is that right or did I mis-configure the config?
Thanks.