Host check time outs in bulk

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
udaykumar
Posts: 66
Joined: Thu Jan 11, 2018 12:55 am

Host check time outs in bulk

Post by udaykumar »

Hello Team,

We are receiving the host down alerts in bulk every day. But when we see, servers are running fine.
When we see the alert history if servers in nagios console, we are seeing the attached message.

Not only for hosts but also services triggering service check timeouts and the message is same as attached.
can you please let us know what needs to be done for not repeating this issue .

we are using the version 3.0.6 core
Attachments
Error message getting triggered for host check and service check time out errors
Error message getting triggered for host check and service check time out errors
HST message.PNG (7.03 KiB) Viewed 4572 times
kyang

Re: Host check time outs in bulk

Post by kyang »

At first, you say you are using Core 3.0.6 but in your screenshots, it's showing 3.2.3?

What OS are you on?

Could you also run "one" of these commands? Depending on your OS. Please post the output or let us know if both do not work.

Code: Select all

rpm -qa | grep nagios
OR

Code: Select all

dpkg --list | grep nagios
Thank you!
udaykumar
Posts: 66
Joined: Thu Jan 11, 2018 12:55 am

Re: Host check time outs in bulk

Post by udaykumar »

Hi ,

Sorry for confusion, its 3.2.3 version we are using.

And both commands provided by you are not working in my environment.

so we tried,

[root@ objects]# rpm -qa | grep -i nagios
perl-Nagios-Plugin-0.27-1.el5.rf


We are using the OS...

LSB Version: :core-4.0-amd64:core-4.0-ia32:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-ia32:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-ia32:printing-4.0-noarch
Distributor ID: RedHatEnterpriseServer
Description: Red Hat Enterprise Linux Server release 5.11 (Tikanga)
Release: 5.11
kyang

Re: Host check time outs in bulk

Post by kyang »

Please run this command and post the output.

Code: Select all

ps -aef | grep nagios.cfg
How many hosts and services do you have?
udaykumar
Posts: 66
Joined: Thu Jan 11, 2018 12:55 am

Re: Host check time outs in bulk

Post by udaykumar »

Hi,

Output of command is as below,

[root@ ~]# ps -aef | grep nagios.cfg
nagios 3448 1 8 Feb10 ? 05:25:42 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 19484 3448 0 19:29 ? 00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 19486 3448 0 19:29 ? 00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 19550 3448 0 19:29 ? 00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 19939 3448 0 19:29 ? 00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 19947 3448 44 19:29 ? 00:00:01 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 19948 3448 44 19:29 ? 00:00:01 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 19950 3448 45 19:29 ? 00:00:01 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 19951 3448 41 19:29 ? 00:00:01 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 19952 3448 44 19:29 ? 00:00:01 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 19953 3448 39 19:29 ? 00:00:01 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 19954 3448 29 19:29 ? 00:00:01 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 19955 3448 33 19:29 ? 00:00:01 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 19956 3448 33 19:29 ? 00:00:01 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 20021 3448 0 19:29 ? 00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
root 20036 19961 0 19:29 pts/4 00:00:00 grep nagios.cfg


we are handling, 1284 hosts and 9100 services which includes all types of OS and network devices.
kyang

Re: Host check time outs in bulk

Post by kyang »

That is a lot of Nagios processes running.

Please run this command to clear that up.

Code: Select all

service nagios stop
killall -9 nagios
service nagios start
After running those commands, please post the output of this one.

Code: Select all

ps -ef | head -1 && ps -ef | grep bin/nagios
udaykumar
Posts: 66
Joined: Thu Jan 11, 2018 12:55 am

Re: Host check time outs in bulk

Post by udaykumar »

hi,

[rootwindows]# ps -ef | head -1 && ps -ef | grep bin/nagios
UID PID PPID C STIME TTY TIME CMD
nagios 31126 1 31 18:24 ? 00:00:01 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 31129 31126 0 18:24 ? 00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 31131 31126 0 18:24 ? 00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
root 31204 17825 0 18:24 pts/0 00:00:00 grep bin/nagios
User avatar
mcapra
Posts: 3739
Joined: Thu May 05, 2016 3:54 pm

Re: Host check time outs in bulk

Post by mcapra »

Has this improved the machine's performance at all? It may be that the machine is just overloaded. Having many child processes isn't necessarily a problem, but if you have several long running checks you can eat up resources pretty fast.
Former Nagios employee
https://www.mcapra.com/
User avatar
tgriep
Madmin
Posts: 9177
Joined: Thu Oct 30, 2014 9:02 am

Re: Host check time outs in bulk

Post by tgriep »

@udaykumar, let us know if this fixed the issue.
Be sure to check out our Knowledgebase for helpful articles and solutions!
udaykumar
Posts: 66
Joined: Thu Jan 11, 2018 12:55 am

Re: Host check time outs in bulk

Post by udaykumar »

Hi,
The issue is not fixed. As you said , It may be because of overload.
Do we have any other solution to fix the issue.
Locked