Nagios Host/Service Check Timeouts

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Dusan.Mandic
Posts: 60
Joined: Mon Apr 06, 2020 2:30 pm

Nagios Host/Service Check Timeouts

Post by Dusan.Mandic »

Hello,

We seem to be having periods of time where some of our hosts/services are timing out (60/30 secs). Most of these arent causing HARD notifications, but show up on our monitoring board and have drawn the attention of our support team. Don't see anything unusual in our system statistics/monitoring engine stats. Database usage looks normal as well. Included system profile.

Code: Select all

[root@bnalmnag702 var]# less eventman.log

            [contact] => xxxxxxxxx
            [contactemail] => [email protected]
            [type] => PROBLEM
            [escalated] => 0
            [author] => 
            [comments] => 
            [host] => Mxxxxxx
            [hostaddress] => 10.200.170.100
            [hostalias] => Gxxxxxxx
            [hostdisplayname] => Mxxxxx
            [hoststate] => DOWN
            [hoststateid] => 1
            [lasthoststate] => DOWN
            [lasthoststateid] => 1
            [hoststatetype] => HARD
            [currentattempt] => 5
            [maxattempts] => 5
            [hosteventid] => 1167267
            [hostproblemid] => 372100
            [hostoutput] => (Host check timed out after 31.01 seconds)
            [longhostoutput] => 
            [datetime] => Tue Aug 24 03:52:58 CDT 2021
Moderator's Note: The profile has been shared with the support team but has been removed from the public forum.
dchurch
Posts: 858
Joined: Wed Oct 07, 2020 12:46 pm
Location: Yo mama

Re: Nagios Host/Service Check Timeouts

Post by dchurch »

The nagios user account's password is probably setup to expire and it cannot be set that way. That account cannot have its password expire because the system uses that account to run processes automatically and when it expires, it cannot do so.

Please login as root and run the following commands to fix the issue:

Code: Select all

passwd --delete nagios
chage -I -1 -m 0 -M -1 -E -1 nagios
If you didn't get an 8% raise over the course of the pandemic, you took a pay cut.

Discussion of wages is protected speech under the National Labor Relations Act, and no employer can tell you you can't disclose your pay with your fellow employees.
Dusan.Mandic
Posts: 60
Joined: Mon Apr 06, 2020 2:30 pm

Re: Nagios Host/Service Check Timeouts

Post by Dusan.Mandic »

i have executed the above change and im still witnessing timeouts.
User avatar
pbroste
Posts: 1288
Joined: Tue Jun 01, 2021 1:27 pm

Re: Nagios Host/Service Check Timeouts

Post by pbroste »

Hello @Dusan.Mandic

Thanks for reaching out and would like to take a look at your Nagios XI System Profile so we can see what is going on.

To send us your system profile.
  • Login to the Nagios XI GUI using a web browser.
  • Click the "Admin" > "System Profile" Menu
  • Click the "Download Profile" button
  • Save the profile.zip file and send via private message
Thanks,
Perry
Dusan.Mandic
Posts: 60
Joined: Mon Apr 06, 2020 2:30 pm

Re: Nagios Host/Service Check Timeouts

Post by Dusan.Mandic »

sent thanks
User avatar
pbroste
Posts: 1288
Joined: Tue Jun 01, 2021 1:27 pm

Re: Nagios Host/Service Check Timeouts

Post by pbroste »

Hello Dusan.Mandic,

Thanks for sending over the System Profile. In review we see that things are set for best overall performance (touching on previous support case).

We see that message "Connection timed out after 15001 milliseconds" is posted in the log messages.

Want to look at your database table to see if anything is out of control from the last time it was sent over.

Code: Select all

echo "select table_name as 'Table', round(((data_length + index_length) / 1024 / 1024), 2) 'Size in MB' from information_schema.TABLES where table_schema in ('nagios', 'nagiosql', 'nagiosxi');" | mysql -h 127.0.0.1 -uroot -pnagiosxi --table
Want to also receive the follow details from your environment:

Code: Select all

chage -l nagios >/tmp/info.txt
grep nag /etc/group >>/tmp/info.txt
ls -al /var/log >>/tmp/info.txt
ls -al /usr/local/nagiosxi/var >>/tmp/info.txt
ls -alr /usr/local/nagios/var >>/tmp/info.txt
cat /etc/sudoers >>/tmp/info.txt
cat /etc/hosts >>/tmp/info.txt
Please PM me the results in the /tmp/info.txt.

A couple of questions about your environment: we see that you are utilizing the SumoCollector wrapper and want to know what this is used for? Any chances to move that to another server? OR temporarily disable to see if this resolves issues? And secondly, are you using a proxy?

Thanks,
Perry
User avatar
pbroste
Posts: 1288
Joined: Tue Jun 01, 2021 1:27 pm

Re: Nagios Host/Service Check Timeouts

Post by pbroste »

Hello Dusan.Mandic,

Thanks for sending over the info that we requested. Things look good and also wanted to know if there is proxy running and also to check on the pidstat on the 'SumoCollector wrapper' as well.

Please watch this for a bit to see what is going on:

Code: Select all

watch -n 1 pidstat
Thanks,
Perry
Dusan.Mandic
Posts: 60
Joined: Mon Apr 06, 2020 2:30 pm

Re: Nagios Host/Service Check Timeouts

Post by Dusan.Mandic »

No to proxy

Didnt see it pop up on watch, upload a snippet. Lmk if you need a bigger capture
You do not have the required permissions to view the files attached to this post.
User avatar
pbroste
Posts: 1288
Joined: Tue Jun 01, 2021 1:27 pm

Re: Nagios Host/Service Check Timeouts

Post by pbroste »

Hello @Dusan.Mandic

Thanks for sending along the results, appears that is within range and want to dive into some debugging.

Let's go ahead and enable debugging in '/usr/local/nagios/etc/nagios.cfg'. Set the following lines to:

Code: Select all

debug_level=-1
debug_verbosity=2
Then restart nagios service by:

Code: Select all

systemctl restart nagios
Then wait for a check to timeout and send us the nagios.log and nagios.debug file located:

Code: Select all

cat /usr/local/nagios/var/nagios.log /usr/local/nagios/var/nagios.debug > /tmp/results.txt
Also please send the following:

Code: Select all

ethtool -S eth0 > /tmp/ethtoolresults.txt
Thanks,
Perry
Dusan.Mandic
Posts: 60
Joined: Mon Apr 06, 2020 2:30 pm

Re: Nagios Host/Service Check Timeouts

Post by Dusan.Mandic »

Whats the log bloat going to look like at this verbosity? Our last timed out check occurred 8/26. We're currently in change freeze, so I'll have to wait to enact.
Locked