check_nt intermittent time outs

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
MichielvM
Posts: 160
Joined: Thu Oct 24, 2013 3:48 am

check_nt intermittent time outs

Post by MichielvM »

Hi all,

We have one host which is monitored using different servicechecks.
Among those are windows services (which are checked using check_xi_service_nsclient (i.e. check_nt)
During an average day I get quite a few critical alerts reporting "CRITICAL - Socket timeout after 10 seconds" and getting an OK within one or two checks down the line.
I've thinkered with the "-t" parameter up to 60 seconds to no avail.
I've restarted the nsclient service a few times, dito.

I need to point out that this particular windows host is the only one at this site. The other hosts are VMS servers.
We're in the process of migrating to Nagios. The old monitoring tool (OpManager) is not reporting any of this behavior and inspection of the server shows nothing abnormal.
We use this host as a gateway between Nagios Eventlog and the VMS hosts. Those check are ok all the time.

I should think that there is something the matter with Nagios itself or at least with the check_nt plugin.

Here is an extract of the services config. I haven't listed all of them, just
STVMS4-Warning (100% OK)
Uptime (radom time outs)

Code: Select all

define service {
        host_name                       staame-veeam01
        service_description             STVMS4-Warning
        use                             xiwizard_windowseventlog_service
        max_check_attempts              1
        check_interval                  1
        retry_interval                  1
        check_period                    24x7
        notification_interval           1
        notification_period             standbyuren
        notification_options            w,
        notifications_enabled           0
        contacts                        Team 3
        stalking_options                o,w,c,u,
        icon_image                      windowseventlog.png
        _xiwizard                       windowseventlog
        register                        1
        }

define service {
        host_name                       staame-veeam01
        service_description             Uptime
        use                             xiwizard_windowsserver_nsclient_service
        check_command                   check_xi_service_nsclient!!UPTIME!!!!!!
        max_check_attempts              5
        check_interval                  5
        retry_interval                  1
        check_period                    24x7
        notification_interval           60
        notification_period             standbyuren
        notifications_enabled           0
        contacts                        Team 1
        _xiwizard                       windowsserver
        register                        1
        }
I see nothing out of the ordinary.

XI is 2012R2.9
Core version is 3.5.0
sreinhardt
-fno-stack-protector
Posts: 4366
Joined: Mon Nov 19, 2012 12:10 pm

Re: check_nt intermittent time outs

Post by sreinhardt »

Did altering the timeout have any effect on your potentially invalid alerts? I can tell you that it's likely not something with the core engine, as they use our product. More than likely it is either an nsclient issue (which version are you running), or possibly an issue with check_nt, which according to the nsclient developer is deprecated in favor of check_nrpe. Am I correct in understanding that this is an intermediary nsclient system? Such that you are using it to check other systems without the nagios system having to interact with it? What kind of a time range are you issues in, all the time, certain periods of the day? Do they seem to follow similar trends such as alerting around the same times of day, or seemingly completely random? Is it always a timeout issues that alerts you to this issue?
Nagios-Plugins maintainer exclusively, unless you have other C language bugs with open-source nagios projects, then I am happy to help! Please pm or use other communication to alert me to issues as I no longer track the forum.
MichielvM
Posts: 160
Joined: Thu Oct 24, 2013 3:48 am

Re: check_nt intermittent time outs

Post by MichielvM »

Hi Spenser,

Yes, this is an intermediary system. The invalid checks however are local. I.e. they monitor the intermediate itself.
The intermediate role is simply to monitor snmp traps which are sent from VMS servers and converted to events using KIWI.
That part is stable.

The behaviour is seemingly random.
Tinkering with time-out (-t 10, 30, 60) has no effect.
I've downgraded nsclient to 0.3.9, no effect.

I took your advice about check_nrpe, and things look stable now. ;)
tmcdonald
Posts: 9117
Joined: Mon Sep 23, 2013 8:40 am

Re: check_nt intermittent time outs

Post by tmcdonald »

Shall we keep the topic open or are you set to have it closed?
Former Nagios employee
MichielvM
Posts: 160
Joined: Thu Oct 24, 2013 3:48 am

Re: check_nt intermittent time outs

Post by MichielvM »

as far as I am concerned it can be closed. :)
Locked