host in 2 groups with clashing service checks - overwrite ?

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
stucky
Posts: 31
Joined: Mon Apr 20, 2015 11:30 am

host in 2 groups with clashing service checks - overwrite ?

Post by stucky »

Hello
I have a basic problem I have never been able to solve in Nagios. I have defined a hostgroup called "default_linux" which monitors a few basic mountpoints.
I associated a service check with that group to achieve that. Now I can add a host to the group and these will get monitored.
However, they all use the same threshold of -w 10% -c 5%. The problem is that I might have a subgroup of hosts in there where /var is always at 6% left and that's ok.
The service checks for /var on these hosts would always be in WARNING so I created a new group called "default_linux_du_var_5%_1%" with -w 5% -c 1%.
Except, if I add a host there nagios simply adds a second check for /var (one for std 10, 5 and another one for 5,1) which defeats the purpose.
Now I have to remove this host from the "default_linux" group but now it stops monitoring the other basic mount points. I guess the question is:

How do I tell nagios how to deal with hosts that are in several groups with clashing service checks for the same service ? One would have to be labeled "higher" than the other.
I thought templates are the answer but even if one template is a child of the other, the moment I place a host into both groups nagios simply adds the 2 clashing services into the
monitoring list. This has always been a challenge. It seems I'd have to create a flat group for _every_ single possible scenario that I might need and add all relevant services to that group. Tomorrow I might need the same scenario but with different contact groups so I'd have to create yet another group just for that. That cannot be the correct way but I don't see how service templates help me here.
Do I have to associate each individual host with a set of services rather than hostgroups ? What's the purpose of host groups then ?

How are other's dealing with this ?

thx
jdalrymple
Skynet Drone
Posts: 2620
Joined: Wed Feb 11, 2015 1:56 pm

Re: host in 2 groups with clashing service checks - overwrit

Post by jdalrymple »

The behavior you're experience is desired for some people. Say you have Oracle database servers, and MSSQL database servers. It's common for Oracle to fill its tablespace and run at or near 100% full disks all the time, but MSSQL is different. At the end of the day you'll want to aggregate those 2 hostgroups into 1 called "database servers" though so that you can check some common shared services such as OS disk space, memory, load, etc. That is how hostgroups were intended to work.

The problem is that you don't have clashing services. Just because the check_command starts with the same binary doesn't mean that the 2 services are in any way related. The only way I can think to do what you're trying to achieve is with custom object variables:

http://nagios.sourceforge.net/docs/nagi ... tvars.html

You could define a custom object variable in your host definition such as:

Code: Select all

define host{
	host_name	 server
	_cfree	    15%
	_dfree	    5%
	...
	}

Code: Select all

define service{
	hostgroup_name	 servers
	description	Free Space C Drive
	check_command check_disk!$HOSTADDRESS$!$_HOSTcfree$
	...
	}
Does that make any sense?

-- edit --

Changed host_name to hostgroup_name as it makes more sense in this user's use-case.
stucky
Posts: 31
Joined: Mon Apr 20, 2015 11:30 am

Re: host in 2 groups with clashing service checks - overwrit

Post by stucky »

jda

thx for your input. I'm surprised though. Isn't this the most basic use case ?
Let's say I started out with a few simple groups of hosts that are a like. One day a year later one host starts acting up a but and I need to overwrite one service check on that level.
I don't have any custom vars defined. People must have run into this before.
The only way I could fix that is to remove this host from the group that includes this service check and create a custom group just for this particular adjustment.
I'm curious to know what other folks typically do when they encounter this unexpectedly.

I will explore the custom var idea more though.
jolson
Attack Rabbit
Posts: 2560
Joined: Thu Feb 12, 2015 12:40 pm

Re: host in 2 groups with clashing service checks - overwrit

Post by jolson »

There is another way to approach this problem.

If another service is defined with the same name as your existing service, you can attach it to a host to overwrite the inherited service that comes from the hostgroup.

For example, if you had these two services:
2015-04-20 14_28_40-Nagios XI - Nagios Core Config Manager.png
The top service is in hostgroup "NLS-cluster", and the bottom one is a part of any hostgroup.

I make a new host - "Nagios Log Server 2", and assign it to the group "NLS-cluster" which causes it to inherit all of that hostgroups services, which in this case includes the 'Check for Elasticsearch' service.

If I then take the bottom service and assign it to that same host:
2015-04-20 14_36_29-Nagios XI - Nagios Core Config Manager.png
It will overwrite the original inherited service.

The easiest way to get a service up and running would be to 'copy' it and rename it to match the other service exactly. You can add some sort of identifier in the 'Display Name' field if you want to.

I figured I would let you know that this is possible. Thanks!
You do not have the required permissions to view the files attached to this post.
Twits Blog
Show me a man who lives alone and has a perpetually clean kitchen, and 8 times out of 9 I'll show you a man with detestable spiritual qualities.
stucky
Posts: 31
Joined: Mon Apr 20, 2015 11:30 am

Re: host in 2 groups with clashing service checks - overwrit

Post by stucky »

Jda
I cannot confirm that. Here my setup:
2 hosts:

define host {
host_name linux-test01.ps.am.sony.com
use xiwizard_generic_host
address linux-test01.ps.am.sony.com
register 1
}

define host {
host_name linux-test02.ps.am.sony.com
use xiwizard_generic_host
address linux-test02.ps.am.sony.com
register 1
}

and here my service that has 2 different varieties defined. The service is called "custom_check_by_ssh_du_var" and it defined twice by the same name but different description.

[root@prdmgtmon01 ~]# cat /usr/local/nagios/etc/services/custom_check_by_ssh_du_var.cfg
define service {
service_description du /var w10 c5
use xiwizard_generic_service
hostgroup_name default_linux
check_command custom_check_xi_by_ssh!'./check_disk -w 10% -c 5% -p /var'!!!!!!!
register 1
}

define service {
host_name linux-test02.ps.am.sony.com
service_description du /var w20 c10
use xiwizard_generic_service
check_command custom_check_xi_by_ssh!'./check_disk -w 20% -c 10% -p /var'!!!!!!!
register 1
}

It still just adds both /var checks to linux-test02 as before.
jdalrymple
Skynet Drone
Posts: 2620
Joined: Wed Feb 11, 2015 1:56 pm

Re: host in 2 groups with clashing service checks - overwrit

Post by jdalrymple »

jolson wrote:If another service is defined with the same name as your existing service
stucky wrote:

Code: Select all

[root@prdmgtmon01 ~]# cat /usr/local/nagios/etc/services/custom_check_by_ssh_du_var.cfg
define service {
service_description du /var w10 c5
use xiwizard_generic_service
hostgroup_name default_linux
check_command custom_check_xi_by_ssh!'./check_disk -w 10% -c 5% -p /var'!!!!!!!
register 1
}

define service {
host_name linux-test02.ps.am.sony.com
service_description du /var w20 c10
use xiwizard_generic_service
check_command custom_check_xi_by_ssh!'./check_disk -w 20% -c 10% -p /var'!!!!!!!
register 1
}
You did not give your services the same name.

Hostgroups are intended to group LIKE-hosts. So you might have 1000 windows hosts that you don't want the C drive to be over 80%, but on each of those the D drive requirements will be wildly different and as such they should be grouped differently:

Code: Select all

[ -- Fileservers Alert D at 98% full -- ][ -- SQLServers Alert D at 80% full -- ][ -- Webservers Alert D at 90% full -- ]
[ --      non App No load alert      -- ][ --                    App servers Alert when load > 90%                   -- ]
[ --                                       Windows Servers Alert at C 80% full                                       -- ]
Note that groups can be nested. The idea is that if you're finding that the services are being defined differently for different host they are probably not alike to be justifiably in that hostgroup. Again - use nesting so that they are in the same hostgroup where appropriate.

As for others solutions - I'll offer my own "personal" favorite - most checks with thresholds are remote checks (NRPE/NSclient/NCPA) and *personally* I prefer to hard code my thresholds in my host configs, then my check_command is just "check_disk" - no arguments whatsoever. You'll get plenty of differing opinions on favored ways of handling this.
jolson
Attack Rabbit
Posts: 2560
Joined: Thu Feb 12, 2015 12:40 pm

Re: host in 2 groups with clashing service checks - overwrit

Post by jolson »

My services are defined as follows:

Code: Select all

define service {
        service_description             Check for Elasticsearch
        use                             generic-service
        hostgroup_name                  NLS-cluster
        check_command                   check_nrpe!check_for_elasticsearch!!!!!!!
        max_check_attempts              5
        check_interval                  5
        retry_interval                  1
        check_period                    xi_timeperiod_24x7
        notification_interval           60
        notification_period             xi_timeperiod_24x7
        contacts                        nagiosadmin
        _xiwizard                       nrpe
        register                        1
        }

define service {
        host_name                       Nagios Log Server 2
        service_description             Check for Elasticsearch
        use                             generic-service
        check_command                   check_nrpe!check_for_elasticsearch!!!!!!!
        max_check_attempts              6
        check_interval                  6
        retry_interval                  1
        check_period                    xi_timeperiod_24x7
        notification_interval           60
        notification_period             xi_timeperiod_24x7
        contacts                        nagiosadmin
        _xiwizard                       nrpe
        register                        1
        }
Note that the description needs to match. The way to identify them from one another will need to be done through the display_name or Config Name variables. On the GUI those variables are referenced as 'Display name' and 'Config Name' respectively:
2015-04-20 15_33_18-Nagios XI - Nagios Core Config Manager.png
You do not have the required permissions to view the files attached to this post.
Twits Blog
Show me a man who lives alone and has a perpetually clean kitchen, and 8 times out of 9 I'll show you a man with detestable spiritual qualities.
stucky
Posts: 31
Joined: Mon Apr 20, 2015 11:30 am

Re: host in 2 groups with clashing service checks - overwrit

Post by stucky »

Thank you guys for the great feedback so quickly !

Ok getting somewhere now. I thought the service name itself had to match. This works but I noticed that the display name doesn't seem to show up anywhere on the GUI.
I user check_by_ssh and the problem is that this plugin doesn't return the threshold values as part of the output so in the gui you don't know that one /var check is actually using different threshold from the other /var check.
That's why I had added the thresholds into the descriptions like "du /var w10 c5" vs/ "du /var w20 c10". Now the gui just shows the generic "du /var".

Q1. Where is the "Display Name" supposed to show up ? I would really like to see which "/var" check I'm looking at.

thx
jolson
Attack Rabbit
Posts: 2560
Joined: Thu Feb 12, 2015 12:40 pm

Re: host in 2 groups with clashing service checks - overwrit

Post by jolson »

Q1. Where is the "Display Name" supposed to show up ? I would really like to see which "/var" check I'm looking at.
I cannot think of a way to make the 'display_name' variable show up instead of the service description - but it is planned to be released in the next major version of XI.

There is a workaround that I found, using the actions components. Please follow the below screenshots:

Access components:
2015-04-20 16_39_52-Nagios XI - Administration.png
Edit the 'Actions' component:
2015-04-20 16_40_02-Nagios XI - Administration.png
Use the variables in place below, and press 'Apply Settings':
2015-04-20 16_40_24-Nagios XI - Administration.png
You can now see the 'display_name' in the service details:
2015-04-20 16_40_39-Nagios XI.png
You do not have the required permissions to view the files attached to this post.
Twits Blog
Show me a man who lives alone and has a perpetually clean kitchen, and 8 times out of 9 I'll show you a man with detestable spiritual qualities.
stucky
Posts: 31
Joined: Mon Apr 20, 2015 11:30 am

Re: host in 2 groups with clashing service checks - overwrit

Post by stucky »

Not elegant but I take it for now. I'm also playing with nested host groups and that appears to work as expected.
Now I just wish the nested relationship of host groups were in any way indicated graphically.
For now I'll try to make it work with specific naming but it'd be nice to have in the future.
Is this in the pipe for later releases ?
Locked