Page 1 of 2

Host check time outs in bulk

Posted: Wed Feb 07, 2018 9:49 pm
by udaykumar
Hello Team,

We are receiving the host down alerts in bulk every day. But when we see, servers are running fine.
When we see the alert history if servers in nagios console, we are seeing the attached message.

Not only for hosts but also services triggering service check timeouts and the message is same as attached.
can you please let us know what needs to be done for not repeating this issue .

we are using the version 3.0.6 core

Re: Host check time outs in bulk

Posted: Thu Feb 08, 2018 10:41 am
by kyang
At first, you say you are using Core 3.0.6 but in your screenshots, it's showing 3.2.3?

What OS are you on?

Could you also run "one" of these commands? Depending on your OS. Please post the output or let us know if both do not work.

Code: Select all

rpm -qa | grep nagios
OR

Code: Select all

dpkg --list | grep nagios
Thank you!

Re: Host check time outs in bulk

Posted: Fri Feb 09, 2018 3:26 am
by udaykumar
Hi ,

Sorry for confusion, its 3.2.3 version we are using.

And both commands provided by you are not working in my environment.

so we tried,

[root@ objects]# rpm -qa | grep -i nagios
perl-Nagios-Plugin-0.27-1.el5.rf


We are using the OS...

LSB Version: :core-4.0-amd64:core-4.0-ia32:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-ia32:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-ia32:printing-4.0-noarch
Distributor ID: RedHatEnterpriseServer
Description: Red Hat Enterprise Linux Server release 5.11 (Tikanga)
Release: 5.11

Re: Host check time outs in bulk

Posted: Fri Feb 09, 2018 2:44 pm
by kyang
Please run this command and post the output.

Code: Select all

ps -aef | grep nagios.cfg
How many hosts and services do you have?

Re: Host check time outs in bulk

Posted: Mon Feb 12, 2018 3:31 am
by udaykumar
Hi,

Output of command is as below,

[root@ ~]# ps -aef | grep nagios.cfg
nagios 3448 1 8 Feb10 ? 05:25:42 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 19484 3448 0 19:29 ? 00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 19486 3448 0 19:29 ? 00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 19550 3448 0 19:29 ? 00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 19939 3448 0 19:29 ? 00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 19947 3448 44 19:29 ? 00:00:01 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 19948 3448 44 19:29 ? 00:00:01 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 19950 3448 45 19:29 ? 00:00:01 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 19951 3448 41 19:29 ? 00:00:01 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 19952 3448 44 19:29 ? 00:00:01 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 19953 3448 39 19:29 ? 00:00:01 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 19954 3448 29 19:29 ? 00:00:01 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 19955 3448 33 19:29 ? 00:00:01 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 19956 3448 33 19:29 ? 00:00:01 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 20021 3448 0 19:29 ? 00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
root 20036 19961 0 19:29 pts/4 00:00:00 grep nagios.cfg


we are handling, 1284 hosts and 9100 services which includes all types of OS and network devices.

Re: Host check time outs in bulk

Posted: Mon Feb 12, 2018 3:42 pm
by kyang
That is a lot of Nagios processes running.

Please run this command to clear that up.

Code: Select all

service nagios stop
killall -9 nagios
service nagios start
After running those commands, please post the output of this one.

Code: Select all

ps -ef | head -1 && ps -ef | grep bin/nagios

Re: Host check time outs in bulk

Posted: Wed Feb 14, 2018 2:24 am
by udaykumar
hi,

[rootwindows]# ps -ef | head -1 && ps -ef | grep bin/nagios
UID PID PPID C STIME TTY TIME CMD
nagios 31126 1 31 18:24 ? 00:00:01 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 31129 31126 0 18:24 ? 00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 31131 31126 0 18:24 ? 00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
root 31204 17825 0 18:24 pts/0 00:00:00 grep bin/nagios

Re: Host check time outs in bulk

Posted: Wed Feb 14, 2018 2:18 pm
by mcapra
Has this improved the machine's performance at all? It may be that the machine is just overloaded. Having many child processes isn't necessarily a problem, but if you have several long running checks you can eat up resources pretty fast.

Re: Host check time outs in bulk

Posted: Wed Feb 14, 2018 5:29 pm
by tgriep
@udaykumar, let us know if this fixed the issue.

Re: Host check time outs in bulk

Posted: Fri Feb 23, 2018 1:12 am
by udaykumar
Hi,
The issue is not fixed. As you said , It may be because of overload.
Do we have any other solution to fix the issue.