Page 1 of 2

delay in service check

Posted: Fri Mar 04, 2016 10:56 am
by amit.ahuja
i have a service "File_Check" that check every minute whether a file exist or not on multiple servers. But i notice there's a delay in a check process. it doesn't check every min for some of the servers.
2016-03-04_10-51-45.png

Re: delay in service check

Posted: Fri Mar 04, 2016 11:27 am
by hsmith
Can we see the configuration for this service check? I imagine the check_interval may be wrong.

Re: delay in service check

Posted: Fri Mar 04, 2016 1:06 pm
by amit.ahuja

Code: Select all

define service {
        host_name                       testbox
        service_description             File_Check
        check_command                   check_nrpe!check_file_exists!-a /www/html/Keepalive.html!!!!!!
        max_check_attempts              3
        check_interval                  1
        retry_interval                  2
        check_period                    24x7
        notification_interval           15
        contact_groups                  support
        notification_period             24x7
        notifications_enabled           0
        notification_options            w,c,r
        _xiwizard                       nrpe
        register                        1
        }

Re: delay in service check

Posted: Fri Mar 04, 2016 1:46 pm
by rkennedy
Can you please post the definition that relates to 'vews016'? This one appears to be fore textbox, and checking a different file.

Re: delay in service check

Posted: Fri Mar 04, 2016 2:14 pm
by amit.ahuja
It's the same configuration, just like the other hosts. some hosts are checking every minute, some are not.

Code: Select all

define service {
        host_name                      vews016
        service_description             File_Check
        check_command                   check_nrpe!check_file_exists!-a /macys.war/macyshc.html!!!!!!
        max_check_attempts              3
        check_interval                  1
        retry_interval                  2
        check_period                    24x7
        notification_interval           15
        contact_groups                 support
        notification_period             24x7
        notifications_enabled           0
        notification_options            w,c,r
        _xiwizard                       nrpe
        register                        1

Re: delay in service check

Posted: Fri Mar 04, 2016 2:17 pm
by hsmith
Can you please post your /usr/local/nagios/etc/nagios.cfg file here for review?

Re: delay in service check

Posted: Fri Mar 04, 2016 2:43 pm
by amit.ahuja
sure.

Re: delay in service check

Posted: Fri Mar 04, 2016 3:19 pm
by rkennedy
This looks fine as well. I wonder if something is going on in your system.

Can you PM over a profile? (Admin -> System Profile -> Download Profile)

EDIT: profile received

Re: delay in service check

Posted: Tue Mar 08, 2016 3:49 pm
by rkennedy
Just to confirm, do you have 32G of ram allocated to this machine?

I am seeing a few errors -

Code: Select all

Mar  4 08:48:04 MA100DLVMON812 nagios: wproc: 'Core Worker 21908' seems to be choked. ret = -1; bufsize = 117: errno = 11 (Resource temporarily unavailable)

Code: Select all

160303  7:30:07 [Warning] Disk is full writing './nagios/nagios_logentries.TMD' (Errcode: 28). Waiting for someone to free space... (Expect up to 60 secs delay for server to continue after freeing disk space)
Additionally, at the top of your processes I saw this -

Code: Select all

USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root     24976 51.6  2.5 984824 844964 pts/2   S+   16:49   0:07 vim nagios_logentries.MYD
root     17385  0.1  0.7 2897740 233684 ?      Sl    2015 358:58 /opt/IBM/ITM/lx8266/lz/bin/klzagent
What is the output of df -H?

Re: delay in service check

Posted: Tue Mar 08, 2016 4:42 pm
by amit.ahuja
yes i do have 32G allocated to this vm, i saw that /var was full and cleaned them. i also changed some performance setting and adjust reaper setting in nagios.cfg. It's working now.

Thanks