NSClient++ service monitoring

pyranha · Post by **pyranha** » Tue Feb 05, 2019 12:28 pm

I am using Nagios Core to monitor a Windows IIS server. In general everything works fine but ocassionally one of the w3wp.exe processes will cause the CPU to go to 100% at which point NSClient++ becomes unresponsive. This causes the server to look like it is still up, but no disk, CPU, or memory checks are confirmed.

Is there a way to monitor the status of the NSClient++ service so when it becomes unavailable I will be notified?

I have looked through the documentation and I am not seeing what I want to accomplish.

Thanks.

Post by **lmiltchev** » Tue Feb 05, 2019 1:58 pm

Is there a way to monitor the status of the NSClient++ service so when it becomes unavailable I will be notified?

Well, if the NSClient++ service stopped, all of your checks would fail, and you would be notified for sure.

I don't think you need to install a different agent in order to monitor this one service... You could set up a new service, using check_nt, for example checking the agent's version, which runs a lot more frequently than the rest of the checks. For example:

Code: Select all

/usr/local/nagios/libexec/check_nt -H <client ip> -s <password> -p 12489 -v CLIENTVERSION

pyranha · Post by **pyranha** » Tue Feb 05, 2019 3:36 pm

We have the version check enabled and it works as long as the CPU is not 100%.
We also have a generic-service defined (see below) but it is not behaving as I would suspect.
I would assume Nagios would send an alert if it did not hear back from, or could not reach, NSClient++.

Post by **lmiltchev** » Tue Feb 05, 2019 4:54 pm

We also have a generic-service defined (see below) but it is not behaving as I would suspect.

Can you elaborate on this ("but it is not behaving as I would suspect")? This is just a template. I am not sure if you are using it at all in your "version check" service... Unless you showed us your service config, along with all of the other relevant configs, commands, and templates, used by this service, we wouldn't know how to help you.

Nagios will notify you when the host/service goes in a hard, non-OK state, provided you have a monitoring contact added to the object, notifications are enabled, etc. There are many filters that need to be passed in order for the notifications to go out. Please read more on notifications here:

https://assets.nagios.com/downloads/nag ... tions.html

pyranha · Post by **pyranha** » Tue Feb 05, 2019 6:00 pm

To elaborate.....When the host's CPU utilization reaches 100%, the NSClient++ service in unable to respond. It seems I can ping the host, but no command is run on the host by NSClient++ due to the CPU being 100%. So the state I end up with is an IIS server that is hung due to CPU utilization, but no alert because NSClient++ is not sending anything back. I hope this makes sense. This is the only condition where Nagios does not alert us when we expect it to. All other conditions seem to work. Some background.....I have Nagios Core deployed across 7 disparate sites monitoring a wide variety of infrastructure for small to midsize customers at a Managed Services provider. This just seems to have me stumped.

I have the following defined to use the generic-service.

define service{
use generic-service
hostgroup_name windows-servers-sterling
service_description NSClient++ Version
check_command check_nt!CLIENTVERSION
}

pyranha · Post by **pyranha** » Tue Feb 05, 2019 8:43 pm

When Do Notifications Occur?
The decision to send out notifications is made in the service check and host check logic. The calculations for whether a notification is to be sent are only triggered when processing a host or service check corresponding to that notification; they are not triggered simply because the <notification_interval> has passed since a previous notification was sent. Host and service notifications occur in the following instances:

When a hard state change occurs. More information on state types and hard state changes can be found here.
When a host or service remains in a hard non-OK state and the time specified by the <notification_interval> option in the host or service definition has passed since the last notification was sent out (for that specified host or service).

If the host is in hard non-OK state, notifications for services on this host won't be sent out.

The previous line makes me thing Nagios is behaving as designed.

Post by **lmiltchev** » Wed Feb 06, 2019 9:45 am

If the host is in hard non-OK state, notifications for services on this host won't be sent out.

The previous line makes me thing Nagios is behaving as designed.

Yes, you are correct. If the host is down in a hard non-OK state, nagios won't be sending notifications about the services on this host.

pyranha · Post by **pyranha** » Wed Feb 06, 2019 12:48 pm

I may be confusing the issue. The host is seen as ok by Nagios (Green in the web interface), but since the CPU is at 100% NSClient is not responding. No alert is being sent for the CPU being 100%. If you let me know which files would help in diagnosing this problem I would be happy to provide them.

Post by **lmiltchev** » Wed Feb 06, 2019 1:28 pm

Can you zip up the entire nagios directory (e.g. /usr/local/nagios), and PM me the zip file? I am looking for the nagios.log, and all of the conifg files that are relevant for this case.

Who is the contact that is supposed to be receiving notificaitons? Does he/she receive notifications for other hosts/services?

Nagios Support Forum

NSClient++ service monitoring

NSClient++ service monitoring

Re: NSClient++ service monitoring

Re: NSClient++ service monitoring

Re: NSClient++ service monitoring

Re: NSClient++ service monitoring

Re: NSClient++ service monitoring

Re: NSClient++ service monitoring

Re: NSClient++ service monitoring

Re: NSClient++ service monitoring