Host check time outs in bulk
Host check time outs in bulk
Hello Team,
We are receiving the host down alerts in bulk every day. But when we see, servers are running fine.
When we see the alert history if servers in nagios console, we are seeing the attached message.
Not only for hosts but also services triggering service check timeouts and the message is same as attached.
can you please let us know what needs to be done for not repeating this issue .
we are using the version 3.0.6 core
We are receiving the host down alerts in bulk every day. But when we see, servers are running fine.
When we see the alert history if servers in nagios console, we are seeing the attached message.
Not only for hosts but also services triggering service check timeouts and the message is same as attached.
can you please let us know what needs to be done for not repeating this issue .
we are using the version 3.0.6 core
- Attachments
-
- Error message getting triggered for host check and service check time out errors
- HST message.PNG (7.03 KiB) Viewed 4648 times
Re: Host check time outs in bulk
At first, you say you are using Core 3.0.6 but in your screenshots, it's showing 3.2.3?
What OS are you on?
Could you also run "one" of these commands? Depending on your OS. Please post the output or let us know if both do not work.
OR
Thank you!
What OS are you on?
Could you also run "one" of these commands? Depending on your OS. Please post the output or let us know if both do not work.
Code: Select all
rpm -qa | grep nagios
Code: Select all
dpkg --list | grep nagios
Re: Host check time outs in bulk
Hi ,
Sorry for confusion, its 3.2.3 version we are using.
And both commands provided by you are not working in my environment.
so we tried,
[root@ objects]# rpm -qa | grep -i nagios
perl-Nagios-Plugin-0.27-1.el5.rf
We are using the OS...
LSB Version: :core-4.0-amd64:core-4.0-ia32:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-ia32:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-ia32:printing-4.0-noarch
Distributor ID: RedHatEnterpriseServer
Description: Red Hat Enterprise Linux Server release 5.11 (Tikanga)
Release: 5.11
Sorry for confusion, its 3.2.3 version we are using.
And both commands provided by you are not working in my environment.
so we tried,
[root@ objects]# rpm -qa | grep -i nagios
perl-Nagios-Plugin-0.27-1.el5.rf
We are using the OS...
LSB Version: :core-4.0-amd64:core-4.0-ia32:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-ia32:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-ia32:printing-4.0-noarch
Distributor ID: RedHatEnterpriseServer
Description: Red Hat Enterprise Linux Server release 5.11 (Tikanga)
Release: 5.11
Re: Host check time outs in bulk
Please run this command and post the output.
How many hosts and services do you have?
Code: Select all
ps -aef | grep nagios.cfg
Re: Host check time outs in bulk
Hi,
Output of command is as below,
[root@ ~]# ps -aef | grep nagios.cfg
nagios 3448 1 8 Feb10 ? 05:25:42 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 19484 3448 0 19:29 ? 00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 19486 3448 0 19:29 ? 00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 19550 3448 0 19:29 ? 00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 19939 3448 0 19:29 ? 00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 19947 3448 44 19:29 ? 00:00:01 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 19948 3448 44 19:29 ? 00:00:01 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 19950 3448 45 19:29 ? 00:00:01 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 19951 3448 41 19:29 ? 00:00:01 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 19952 3448 44 19:29 ? 00:00:01 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 19953 3448 39 19:29 ? 00:00:01 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 19954 3448 29 19:29 ? 00:00:01 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 19955 3448 33 19:29 ? 00:00:01 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 19956 3448 33 19:29 ? 00:00:01 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 20021 3448 0 19:29 ? 00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
root 20036 19961 0 19:29 pts/4 00:00:00 grep nagios.cfg
we are handling, 1284 hosts and 9100 services which includes all types of OS and network devices.
Output of command is as below,
[root@ ~]# ps -aef | grep nagios.cfg
nagios 3448 1 8 Feb10 ? 05:25:42 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 19484 3448 0 19:29 ? 00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 19486 3448 0 19:29 ? 00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 19550 3448 0 19:29 ? 00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 19939 3448 0 19:29 ? 00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 19947 3448 44 19:29 ? 00:00:01 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 19948 3448 44 19:29 ? 00:00:01 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 19950 3448 45 19:29 ? 00:00:01 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 19951 3448 41 19:29 ? 00:00:01 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 19952 3448 44 19:29 ? 00:00:01 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 19953 3448 39 19:29 ? 00:00:01 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 19954 3448 29 19:29 ? 00:00:01 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 19955 3448 33 19:29 ? 00:00:01 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 19956 3448 33 19:29 ? 00:00:01 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 20021 3448 0 19:29 ? 00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
root 20036 19961 0 19:29 pts/4 00:00:00 grep nagios.cfg
we are handling, 1284 hosts and 9100 services which includes all types of OS and network devices.
Re: Host check time outs in bulk
That is a lot of Nagios processes running.
Please run this command to clear that up.
After running those commands, please post the output of this one.
Please run this command to clear that up.
Code: Select all
service nagios stop
killall -9 nagios
service nagios start
Code: Select all
ps -ef | head -1 && ps -ef | grep bin/nagios
Re: Host check time outs in bulk
hi,
[rootwindows]# ps -ef | head -1 && ps -ef | grep bin/nagios
UID PID PPID C STIME TTY TIME CMD
nagios 31126 1 31 18:24 ? 00:00:01 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 31129 31126 0 18:24 ? 00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 31131 31126 0 18:24 ? 00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
root 31204 17825 0 18:24 pts/0 00:00:00 grep bin/nagios
[rootwindows]# ps -ef | head -1 && ps -ef | grep bin/nagios
UID PID PPID C STIME TTY TIME CMD
nagios 31126 1 31 18:24 ? 00:00:01 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 31129 31126 0 18:24 ? 00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 31131 31126 0 18:24 ? 00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
root 31204 17825 0 18:24 pts/0 00:00:00 grep bin/nagios
Re: Host check time outs in bulk
Has this improved the machine's performance at all? It may be that the machine is just overloaded. Having many child processes isn't necessarily a problem, but if you have several long running checks you can eat up resources pretty fast.
Former Nagios employee
https://www.mcapra.com/
https://www.mcapra.com/
Re: Host check time outs in bulk
@udaykumar, let us know if this fixed the issue.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Re: Host check time outs in bulk
Hi,
The issue is not fixed. As you said , It may be because of overload.
Do we have any other solution to fix the issue.
The issue is not fixed. As you said , It may be because of overload.
Do we have any other solution to fix the issue.