Hi,
We are receiving bulk Alerts from particular servers team stating=Service check timed out after 60.01 seconds.
After 3 or 4 checks the service state is becoming normal.
Could you please provide us the solution.
I observed that the most of those services are configured with check nrpe -2, check_xi_service_http_cert
Thank you!
Bulk Alerts - Timed out
-
lanxessinfy
- Posts: 68
- Joined: Tue Nov 24, 2020 5:55 am
Bulk Alerts - Timed out
You do not have the required permissions to view the files attached to this post.
Re: Bulk Alerts - Timed out
Please PM me a copy of your profile.zip, you can download it from Admin > System Profile by clicking the Download Profile button.
Please run one of the checks from a SSH session to the XI server, are you able to replicate the timeout from the command line as well? Try adding a -v onto the end of the command to see if it will show any verbose output when it's timing out.
We'll need to figure out why they are timing out. If it's a known thing that it will sometimes take 120 seconds every now and then because of load/network/off-server jobs/etc we'll need to adjust some timeouts on the plugin and in the nagios.cfg.
Please run one of the checks from a SSH session to the XI server, are you able to replicate the timeout from the command line as well? Try adding a -v onto the end of the command to see if it will show any verbose output when it's timing out.
We'll need to figure out why they are timing out. If it's a known thing that it will sometimes take 120 seconds every now and then because of load/network/off-server jobs/etc we'll need to adjust some timeouts on the plugin and in the nagios.cfg.
-
lanxessinfy
- Posts: 68
- Joined: Tue Nov 24, 2020 5:55 am
Re: Bulk Alerts - Timed out
Hi,
Shared the profile, please check.
[root@xxxxxLNGIOSAD]# /usr/local/nagios/libexec/check_http -H x.x.x.x -t 60 -C 30 -p 443 -v
SSL initialized
SSL OK - Certificate 'xxxxxxx' will expire in 138 days on 2022-03-30 15:17 +0200/CEST.
[root@xxxxxxLNGIOSAD]# /usr/local/nagios/libexec/check_http -H x.x.x.x -t 60 -C 30 -p 443 -v
CRITICAL - Socket timeout
[root@xxxxxxLNGIOSAD]# /usr/local/nagios/libexec/check_nrpe -2 -H x.x.x.x -t 300-c CheckMem -a MaxWarn=90% MaxCrit=95% ShowAll type=physical -v
I (0.4.4.23 2016-04-05) seem to be doing fine...
Please find the output.
The time difference between 1st command and 2nd command is 2sec but I have got socket timeout and if i run again im getting output.
And for the 3rd command im getting output as "I (0.4.4.23 2016-04-05) seem to be doing fine..." and sometimes Socket timed out after 299 sec.
We are facing this issue for almost 200 services.
please provide the suitable solution.
Thanks
Shared the profile, please check.
[root@xxxxxLNGIOSAD]# /usr/local/nagios/libexec/check_http -H x.x.x.x -t 60 -C 30 -p 443 -v
SSL initialized
SSL OK - Certificate 'xxxxxxx' will expire in 138 days on 2022-03-30 15:17 +0200/CEST.
[root@xxxxxxLNGIOSAD]# /usr/local/nagios/libexec/check_http -H x.x.x.x -t 60 -C 30 -p 443 -v
CRITICAL - Socket timeout
[root@xxxxxxLNGIOSAD]# /usr/local/nagios/libexec/check_nrpe -2 -H x.x.x.x -t 300-c CheckMem -a MaxWarn=90% MaxCrit=95% ShowAll type=physical -v
I (0.4.4.23 2016-04-05) seem to be doing fine...
Please find the output.
The time difference between 1st command and 2nd command is 2sec but I have got socket timeout and if i run again im getting output.
And for the 3rd command im getting output as "I (0.4.4.23 2016-04-05) seem to be doing fine..." and sometimes Socket timed out after 299 sec.
We are facing this issue for almost 200 services.
please provide the suitable solution.
Thanks
Re: Bulk Alerts - Timed out
Socket timeout from the command line sounds like it could be a network issue.
Do you have any IPS/security devices that could be throttling it?
I see these using 100%:
Please disable your security software and see if that resolves the issue.
What are the output of these commands as root:
Do you have any IPS/security devices that could be throttling it?
I see these using 100%:
Code: Select all
8: 4614 root 20 0 0 0 0 R 100.0 0.0 30076:37 kcs-evbsyn+
9: 4612 root 20 0 0 0 0 R 100.0 0.0 30076:52 kcs-evbsyn+
10: 4613 root 20 0 0 0 0 R 100.0 0.0 30076:44 kcs-evbsyn+
11: 4615 root 20 0 0 0 0 R 100.0 0.0 30076:35 kcs-evbsyn+What are the output of these commands as root:
Code: Select all
netstat -s
ethtool -S eth0
ulimit -a
su -s /bin/bash -c 'ulimit -a' nagios
su -s /bin/bash -c 'ulimit -a' mysql-
lanxessinfy
- Posts: 68
- Joined: Tue Nov 24, 2020 5:55 am
Re: Bulk Alerts - Timed out
Hi,
Now we don't see any crowdstrike processes running.
Please find the output of the commands as requested.
Thanks!
Now we don't see any crowdstrike processes running.
Please find the output of the commands as requested.
Thanks!
You do not have the required permissions to view the files attached to this post.
Re: Bulk Alerts - Timed out
Please create a ticket for this and include a link back to this forum thread so we can get a remote session setup:
https://support.nagios.com/tickets/
Thank you!
https://support.nagios.com/tickets/
Thank you!