Page 1 of 1

Bulk Alerts - Timed out

Posted: Thu Nov 11, 2021 4:31 am
by lanxessinfy
Hi,

We are receiving bulk Alerts from particular servers team stating=Service check timed out after 60.01 seconds.

After 3 or 4 checks the service state is becoming normal.

Could you please provide us the solution.

I observed that the most of those services are configured with check nrpe -2, check_xi_service_http_cert

Thank you!

Re: Bulk Alerts - Timed out

Posted: Thu Nov 11, 2021 4:16 pm
by ssax
Please PM me a copy of your profile.zip, you can download it from Admin > System Profile by clicking the Download Profile button.

Please run one of the checks from a SSH session to the XI server, are you able to replicate the timeout from the command line as well? Try adding a -v onto the end of the command to see if it will show any verbose output when it's timing out.

We'll need to figure out why they are timing out. If it's a known thing that it will sometimes take 120 seconds every now and then because of load/network/off-server jobs/etc we'll need to adjust some timeouts on the plugin and in the nagios.cfg.

Re: Bulk Alerts - Timed out

Posted: Fri Nov 12, 2021 3:54 am
by lanxessinfy
Hi,

Shared the profile, please check.

[root@xxxxxLNGIOSAD]# /usr/local/nagios/libexec/check_http -H x.x.x.x -t 60 -C 30 -p 443 -v
SSL initialized
SSL OK - Certificate 'xxxxxxx' will expire in 138 days on 2022-03-30 15:17 +0200/CEST.

[root@xxxxxxLNGIOSAD]# /usr/local/nagios/libexec/check_http -H x.x.x.x -t 60 -C 30 -p 443 -v
CRITICAL - Socket timeout

[root@xxxxxxLNGIOSAD]# /usr/local/nagios/libexec/check_nrpe -2 -H x.x.x.x -t 300-c CheckMem -a MaxWarn=90% MaxCrit=95% ShowAll type=physical -v
I (0.4.4.23 2016-04-05) seem to be doing fine...

Please find the output.
The time difference between 1st command and 2nd command is 2sec but I have got socket timeout and if i run again im getting output.

And for the 3rd command im getting output as "I (0.4.4.23 2016-04-05) seem to be doing fine..." and sometimes Socket timed out after 299 sec.

We are facing this issue for almost 200 services.
please provide the suitable solution.

Thanks

Re: Bulk Alerts - Timed out

Posted: Fri Nov 12, 2021 2:24 pm
by ssax
Socket timeout from the command line sounds like it could be a network issue.

Do you have any IPS/security devices that could be throttling it?

I see these using 100%:

Code: Select all

    8:  4614 root      20   0       0      0      0 R 100.0  0.0  30076:37 kcs-evbsyn+
    9:  4612 root      20   0       0      0      0 R 100.0  0.0  30076:52 kcs-evbsyn+
   10:  4613 root      20   0       0      0      0 R 100.0  0.0  30076:44 kcs-evbsyn+
   11:  4615 root      20   0       0      0      0 R 100.0  0.0  30076:35 kcs-evbsyn+
Please disable your security software and see if that resolves the issue.

What are the output of these commands as root:

Code: Select all

netstat -s
ethtool -S eth0
ulimit -a
su -s /bin/bash -c 'ulimit -a' nagios
su -s /bin/bash -c 'ulimit -a' mysql

Re: Bulk Alerts - Timed out

Posted: Mon Nov 15, 2021 2:35 am
by lanxessinfy
Hi,

Now we don't see any crowdstrike processes running.

Please find the output of the commands as requested.

Thanks!

Re: Bulk Alerts - Timed out

Posted: Mon Nov 15, 2021 2:54 pm
by ssax
Please create a ticket for this and include a link back to this forum thread so we can get a remote session setup:

https://support.nagios.com/tickets/

Thank you!