Host check time outs in bulk

An open discussion forum for obtaining help with Nagios Core. Nagios Core users of all experience levels are welcome here. Subforum have been created for the discussion of Nagios Core and Nagios Plugin development.

NOTE: The SourceForge.net mailing lists have been deprecated in favor of this forum in order to expedite support and provide additional features not available on the old mailing list.

Host check time outs in bulk

Postby udaykumar » Wed Feb 07, 2018 9:49 pm

Hello Team,

We are receiving the host down alerts in bulk every day. But when we see, servers are running fine.
When we see the alert history if servers in nagios console, we are seeing the attached message.

Not only for hosts but also services triggering service check timeouts and the message is same as attached.
can you please let us know what needs to be done for not repeating this issue .

we are using the version 3.0.6 core
Attachments
HST message.PNG
Error message getting triggered for host check and service check time out errors
HST message.PNG (7.03 KiB) Viewed 155 times
udaykumar
 
Posts: 11
Joined: Thu Jan 11, 2018 12:55 am

Re: Host check time outs in bulk

Postby kyang » Thu Feb 08, 2018 10:41 am

At first, you say you are using Core 3.0.6 but in your screenshots, it's showing 3.2.3?

What OS are you on?

Could you also run "one" of these commands? Depending on your OS. Please post the output or let us know if both do not work.
Code: Select all
rpm -qa | grep nagios


OR

Code: Select all
dpkg --list | grep nagios


Thank you!
Be sure to check out our Knowledgebase for helpful articles and solutions!
User avatar
kyang
Support Tech
 
Posts: 1514
Joined: Tue Jul 25, 2017 3:35 pm

Re: Host check time outs in bulk

Postby udaykumar » Fri Feb 09, 2018 3:26 am

Hi ,

Sorry for confusion, its 3.2.3 version we are using.

And both commands provided by you are not working in my environment.

so we tried,

[root@ objects]# rpm -qa | grep -i nagios
perl-Nagios-Plugin-0.27-1.el5.rf


We are using the OS...

LSB Version: :core-4.0-amd64:core-4.0-ia32:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-ia32:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-ia32:printing-4.0-noarch
Distributor ID: RedHatEnterpriseServer
Description: Red Hat Enterprise Linux Server release 5.11 (Tikanga)
Release: 5.11
udaykumar
 
Posts: 11
Joined: Thu Jan 11, 2018 12:55 am

Re: Host check time outs in bulk

Postby kyang » Fri Feb 09, 2018 2:44 pm

Please run this command and post the output.

Code: Select all
ps -aef | grep nagios.cfg


How many hosts and services do you have?
Be sure to check out our Knowledgebase for helpful articles and solutions!
User avatar
kyang
Support Tech
 
Posts: 1514
Joined: Tue Jul 25, 2017 3:35 pm

Re: Host check time outs in bulk

Postby udaykumar » Mon Feb 12, 2018 3:31 am

Hi,

Output of command is as below,

[root@ ~]# ps -aef | grep nagios.cfg
nagios 3448 1 8 Feb10 ? 05:25:42 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 19484 3448 0 19:29 ? 00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 19486 3448 0 19:29 ? 00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 19550 3448 0 19:29 ? 00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 19939 3448 0 19:29 ? 00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 19947 3448 44 19:29 ? 00:00:01 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 19948 3448 44 19:29 ? 00:00:01 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 19950 3448 45 19:29 ? 00:00:01 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 19951 3448 41 19:29 ? 00:00:01 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 19952 3448 44 19:29 ? 00:00:01 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 19953 3448 39 19:29 ? 00:00:01 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 19954 3448 29 19:29 ? 00:00:01 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 19955 3448 33 19:29 ? 00:00:01 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 19956 3448 33 19:29 ? 00:00:01 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 20021 3448 0 19:29 ? 00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
root 20036 19961 0 19:29 pts/4 00:00:00 grep nagios.cfg


we are handling, 1284 hosts and 9100 services which includes all types of OS and network devices.
udaykumar
 
Posts: 11
Joined: Thu Jan 11, 2018 12:55 am

Re: Host check time outs in bulk

Postby kyang » Mon Feb 12, 2018 3:42 pm

That is a lot of Nagios processes running.

Please run this command to clear that up.
Code: Select all
service nagios stop
killall -9 nagios
service nagios start


After running those commands, please post the output of this one.

Code: Select all
ps -ef | head -1 && ps -ef | grep bin/nagios
Be sure to check out our Knowledgebase for helpful articles and solutions!
User avatar
kyang
Support Tech
 
Posts: 1514
Joined: Tue Jul 25, 2017 3:35 pm

Re: Host check time outs in bulk

Postby udaykumar » Wed Feb 14, 2018 2:24 am

hi,

[rootwindows]# ps -ef | head -1 && ps -ef | grep bin/nagios
UID PID PPID C STIME TTY TIME CMD
nagios 31126 1 31 18:24 ? 00:00:01 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 31129 31126 0 18:24 ? 00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 31131 31126 0 18:24 ? 00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
root 31204 17825 0 18:24 pts/0 00:00:00 grep bin/nagios
udaykumar
 
Posts: 11
Joined: Thu Jan 11, 2018 12:55 am

Re: Host check time outs in bulk

Postby mcapra » Wed Feb 14, 2018 2:18 pm

Has this improved the machine's performance at all? It may be that the machine is just overloaded. Having many child processes isn't necessarily a problem, but if you have several long running checks you can eat up resources pretty fast.
Former Nagios employee
http://www.mcapra.com/
User avatar
mcapra
 
Posts: 3084
Joined: Thu May 05, 2016 3:54 pm

Re: Host check time outs in bulk

Postby tgriep » Wed Feb 14, 2018 5:29 pm

@udaykumar, let us know if this fixed the issue.
Be sure to check out our Knowledgebase for helpful articles and solutions!
User avatar
tgriep
Madmin
 
Posts: 6447
Joined: Thu Oct 30, 2014 9:02 am

Re: Host check time outs in bulk

Postby udaykumar » Fri Feb 23, 2018 1:12 am

Hi,
The issue is not fixed. As you said , It may be because of overload.
Do we have any other solution to fix the issue.
udaykumar
 
Posts: 11
Joined: Thu Jan 11, 2018 12:55 am

Next

Return to Nagios Core

Who is online

Users browsing this forum: No registered users and 7 guests