Bulk Alerts - Timed out

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
lanxessinfy
Posts: 68
Joined: Tue Nov 24, 2020 5:55 am

Bulk Alerts - Timed out

Post by lanxessinfy »

Hi,

We are receiving bulk Alerts from particular servers team stating=Service check timed out after 60.01 seconds.

After 3 or 4 checks the service state is becoming normal.

Could you please provide us the solution.

I observed that the most of those services are configured with check nrpe -2, check_xi_service_http_cert

Thank you!
You do not have the required permissions to view the files attached to this post.
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: Bulk Alerts - Timed out

Post by ssax »

Please PM me a copy of your profile.zip, you can download it from Admin > System Profile by clicking the Download Profile button.

Please run one of the checks from a SSH session to the XI server, are you able to replicate the timeout from the command line as well? Try adding a -v onto the end of the command to see if it will show any verbose output when it's timing out.

We'll need to figure out why they are timing out. If it's a known thing that it will sometimes take 120 seconds every now and then because of load/network/off-server jobs/etc we'll need to adjust some timeouts on the plugin and in the nagios.cfg.
lanxessinfy
Posts: 68
Joined: Tue Nov 24, 2020 5:55 am

Re: Bulk Alerts - Timed out

Post by lanxessinfy »

Hi,

Shared the profile, please check.

[root@xxxxxLNGIOSAD]# /usr/local/nagios/libexec/check_http -H x.x.x.x -t 60 -C 30 -p 443 -v
SSL initialized
SSL OK - Certificate 'xxxxxxx' will expire in 138 days on 2022-03-30 15:17 +0200/CEST.

[root@xxxxxxLNGIOSAD]# /usr/local/nagios/libexec/check_http -H x.x.x.x -t 60 -C 30 -p 443 -v
CRITICAL - Socket timeout

[root@xxxxxxLNGIOSAD]# /usr/local/nagios/libexec/check_nrpe -2 -H x.x.x.x -t 300-c CheckMem -a MaxWarn=90% MaxCrit=95% ShowAll type=physical -v
I (0.4.4.23 2016-04-05) seem to be doing fine...

Please find the output.
The time difference between 1st command and 2nd command is 2sec but I have got socket timeout and if i run again im getting output.

And for the 3rd command im getting output as "I (0.4.4.23 2016-04-05) seem to be doing fine..." and sometimes Socket timed out after 299 sec.

We are facing this issue for almost 200 services.
please provide the suitable solution.

Thanks
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: Bulk Alerts - Timed out

Post by ssax »

Socket timeout from the command line sounds like it could be a network issue.

Do you have any IPS/security devices that could be throttling it?

I see these using 100%:

Code: Select all

    8:  4614 root      20   0       0      0      0 R 100.0  0.0  30076:37 kcs-evbsyn+
    9:  4612 root      20   0       0      0      0 R 100.0  0.0  30076:52 kcs-evbsyn+
   10:  4613 root      20   0       0      0      0 R 100.0  0.0  30076:44 kcs-evbsyn+
   11:  4615 root      20   0       0      0      0 R 100.0  0.0  30076:35 kcs-evbsyn+
Please disable your security software and see if that resolves the issue.

What are the output of these commands as root:

Code: Select all

netstat -s
ethtool -S eth0
ulimit -a
su -s /bin/bash -c 'ulimit -a' nagios
su -s /bin/bash -c 'ulimit -a' mysql
lanxessinfy
Posts: 68
Joined: Tue Nov 24, 2020 5:55 am

Re: Bulk Alerts - Timed out

Post by lanxessinfy »

Hi,

Now we don't see any crowdstrike processes running.

Please find the output of the commands as requested.

Thanks!
You do not have the required permissions to view the files attached to this post.
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: Bulk Alerts - Timed out

Post by ssax »

Please create a ticket for this and include a link back to this forum thread so we can get a remote session setup:

https://support.nagios.com/tickets/

Thank you!
Locked