Page 1 of 2

How do I not alert on Socket timeout after 30 seconds.

Posted: Mon Jul 15, 2019 7:56 am
by mkeey
We get a lot of these failures in Nagios "Socket timeout after 30 seconds." Would like to turn these notifications off for services. How can I do that?

Re: How do I not alert on Socket timeout after 30 seconds.

Posted: Mon Jul 15, 2019 10:37 am
by mkeey
We have had this issue since we’ve implemented Nagios. Services alert with a Socket Timeout and then moments later the Host alerts that it is down/unavailable.
This has caused a lot of tension with the support teams and I don’t really blame them.

1) Opened a case with support and were told to implemented the config file parameter…
host_down_disable_service_checks=1

This is supposed to make services dependent on hosts. Which we tested and it did indeed work. This was implemented some time ago on ALL Nagios XI configurations

However, we still had instances of services alerting when a host is unavailable.

2) We have been working with the teams to code downtimes for maintenance windows and when CR’s are being carried out. We’ve coded recurring and scheduled downtimes and that has also reduced the number of services notifications. There is still a lot of room for improvement on this and sometimes patching is performed outside of these windows. That’s a procedural issue that we can’t really code around nor fix. It’s up to the Platform/Midrange teams to get us those maintenance windows so we can code the downtimes. This has helped some.

However, we still had instances of services alerting when a host is unavailable.

3) Opened a case with support and they said it may be a timing issue. We need to ensure that our Hosts check more frequently than our services. Updated the hosts to check every 3 minutes vs. every 5 minutes. This seemed to reduce the number of service notifications so it was implemented across all Nagios XI configurations.
Host Disk Check

However, we still are having instances of services alerting when a host is unavailable. How can we force the Host check BEFORE the Service checks occur?

Re: How do I not alert on Socket timeout after 30 seconds.

Posted: Mon Jul 15, 2019 1:55 pm
by tgriep
What version of XI are you running on the server?

Can you run a State History report for the Host and Service that generated the Notification and post it here?
Download it as a CSV file and make sure the settings are set to Both and Any for the State Type and State.

Can you post the settings for the host check and the service check from the /usr/local/nagios/var/objects.cache file?

Thanks

Re: How do I not alert on Socket timeout after 30 seconds.

Posted: Wed Jul 17, 2019 6:19 am
by mkeey
What version of XI are you running on the server?
---- v5.5.5 for all XI's

Can you run a State History report for the Host and Service that generated the Notification and post it here? Download it as a CSV file and make sure the settings are set to Both and Any for the State Type and State.
---- Not sure how to do that. Can you provide step-by-step instructions please?

Can you post the settings for the host check and the service check from the /usr/local/nagios/var/objects.cache file?
---- Sent to you via private message as there may be proprietary data in that file

Re: How do I not alert on Socket timeout after 30 seconds.

Posted: Wed Jul 17, 2019 8:57 am
by tgriep
Please PM me the name of the Host and Services that were sending the Notifications with the Socket Timeout messages.

To run a State History Report, in the XI GUI, go to the Reports Menu and select the State History report link.
Select the Host and run the report.
Click on the Download link to download it as a CSV file.

Re: How do I not alert on Socket timeout after 30 seconds.

Posted: Wed Jul 17, 2019 9:49 am
by mkeey
You've been PM'd

Re: How do I not alert on Socket timeout after 30 seconds.

Posted: Wed Jul 17, 2019 10:23 am
by tgriep
Thanks for the State History report. It did not show any hosts being down so I cannot trouble shoot the issue as I cannot see that the Host Check went down while the service checks failed.
You should upgrade your XI server to at least version 5.5.11 as there are Bugs in the version of Core 5.5.5 is running that could cause the issue you are having.

Re: How do I not alert on Socket timeout after 30 seconds.

Posted: Mon Jul 22, 2019 7:28 am
by mkeey
Thanks for the reply and I appreciate that there weren't any Host alerts. Fact still remains that we get a lot of socket timeouts on Services. I need to turn these off. Here is an example from yesterday where a team received 9 incident records for disk space alerts all due to "socket time outs". How can I stop these from being created?



type time information
Service Recovery 7/21/2019 18:50 SERVICE ALERT: HOSTSERVER;CPU Stats;OK;HARD;1;CPU STATISTICS OK: user=28.81% system=35.37% iowait=3.60% idle=32.23%
Service Critical 7/21/2019 18:46 SERVICE ALERT: HOSTSERVER;CPU Stats;CRITICAL;SOFT;1;CHECK_NRPE STATE CRITICAL: Socket timeout after 30 seconds.
Service Recovery 7/21/2019 18:41 SERVICE ALERT: HOSTSERVER;CPU Stats;OK;HARD;1;CPU STATISTICS OK: user=30.16% system=34.91% iowait=0.03% idle=34.91%
Service Critical 7/21/2019 18:38 SERVICE ALERT: HOSTSERVER;CPU Stats;CRITICAL;SOFT;1;CHECK_NRPE STATE CRITICAL: Socket timeout after 30 seconds.
Service Recovery 7/21/2019 18:33 SERVICE ALERT: HOSTSERVER;CPU Stats;OK;HARD;1;CPU STATISTICS OK: user=25.71% system=45.84% iowait=0.03% idle=28.43%
Service Critical 7/21/2019 18:30 SERVICE ALERT: HOSTSERVER;CPU Stats;CRITICAL;SOFT;1;CHECK_NRPE STATE CRITICAL: Socket timeout after 30 seconds.
Service Recovery 7/21/2019 18:24 SERVICE ALERT: HOSTSERVER;CPU Stats;OK;HARD;1;CPU STATISTICS OK: user=26.43% system=51.96% iowait=0.03% idle=21.59%
Service Critical 7/21/2019 18:21 SERVICE ALERT: HOSTSERVER;CPU Stats;CRITICAL;SOFT;1;CHECK_NRPE STATE CRITICAL: Socket timeout after 30 seconds.
Service Recovery 7/21/2019 18:16 SERVICE ALERT: HOSTSERVER;CPU Stats;OK;HARD;1;CPU STATISTICS OK: user=30.52% system=37.89% iowait=0.03% idle=31.56%
Service Critical 7/21/2019 18:13 SERVICE ALERT: HOSTSERVER;CPU Stats;CRITICAL;SOFT;1;CHECK_NRPE STATE CRITICAL: Socket timeout after 30 seconds.
Service Recovery 7/21/2019 18:07 SERVICE ALERT: HOSTSERVER;CPU Stats;OK;HARD;1;CPU STATISTICS OK: user=31.57% system=35.72% iowait=0.05% idle=32.66%
Service Critical 7/21/2019 18:04 SERVICE ALERT: HOSTSERVER;CPU Stats;CRITICAL;SOFT;1;CHECK_NRPE STATE CRITICAL: Socket timeout after 30 seconds.
Service Recovery 7/21/2019 17:59 SERVICE ALERT: HOSTSERVER;CPU Stats;OK;HARD;1;CPU STATISTICS OK: user=30.28% system=38.56% iowait=0.03% idle=31.14%
Service Critical 7/21/2019 17:56 SERVICE ALERT: HOSTSERVER;CPU Stats;CRITICAL;SOFT;1;CHECK_NRPE STATE CRITICAL: Socket timeout after 30 seconds.
Service Recovery 7/21/2019 17:51 SERVICE ALERT: HOSTSERVER;CPU Stats;OK;HARD;1;CPU STATISTICS OK: user=28.53% system=34.46% iowait=0.03% idle=36.99%
Service Critical 7/21/2019 17:47 SERVICE ALERT: HOSTSERVER;CPU Stats;CRITICAL;SOFT;1;CHECK_NRPE STATE CRITICAL: Socket timeout after 30 seconds.
Service Recovery 7/21/2019 17:42 SERVICE ALERT: HOSTSERVER;CPU Stats;OK;HARD;1;CPU STATISTICS OK: user=27.40% system=32.30% iowait=0.08% idle=40.22%
Service Critical 7/21/2019 17:39 SERVICE ALERT: HOSTSERVER;CPU Stats;CRITICAL;SOFT;1;CHECK_NRPE STATE CRITICAL: Socket timeout after 30 seconds.
Service Recovery 7/21/2019 17:34 SERVICE ALERT: HOSTSERVER;CPU Stats;OK;HARD;1;CPU STATISTICS OK: user=35.75% system=26.16% iowait=0.03% idle=27.63%
Service Critical 7/21/2019 17:30 SERVICE ALERT: HOSTSERVER;CPU Stats;CRITICAL;SOFT;1;CHECK_NRPE STATE CRITICAL: Socket timeout after 30 seconds.
Service Recovery 7/21/2019 17:25 SERVICE ALERT: HOSTSERVER;CPU Stats;OK;HARD;1;CPU STATISTICS OK: user=29.12% system=46.18% iowait=0.03% idle=14.34%
Service Critical 7/21/2019 17:22 SERVICE ALERT: HOSTSERVER;CPU Stats;CRITICAL;SOFT;1;CHECK_NRPE STATE CRITICAL: Socket timeout after 30 seconds.
Service Recovery 7/21/2019 17:17 SERVICE ALERT: HOSTSERVER;CPU Stats;OK;HARD;1;CPU STATISTICS OK: user=38.27% system=28.51% iowait=0.03% idle=33.20%
Service Critical 7/21/2019 17:14 SERVICE ALERT: HOSTSERVER;CPU Stats;CRITICAL;SOFT;1;CHECK_NRPE STATE CRITICAL: Socket timeout after 30 seconds.
Service Recovery 7/21/2019 10:40 SERVICE ALERT: HOSTSERVER;CPU Stats;OK;HARD;1;CPU STATISTICS OK: user=13.71% system=27.37% iowait=0.20% idle=58.69%
Service Critical 7/21/2019 10:37 SERVICE ALERT: HOSTSERVER;CPU Stats;CRITICAL;SOFT;1;CHECK_NRPE STATE CRITICAL: Socket timeout after 30 seconds.
Service Recovery 7/21/2019 0:32 SERVICE ALERT: HOSTSERVER;/opt/ctmagent Disk Usage;OK;HARD;1;DISK OK - free space: /opt/ctmagent 671 MB (66.23% inode=100%):
Service Recovery 7/21/2019 0:32 SERVICE ALERT: HOSTSERVER;/ Disk Usage;OK;HARD;1;DISK OK - free space: / 7364 MB (71.98% inode=100%):
Service Recovery 7/21/2019 0:32 SERVICE ALERT: HOSTSERVER;/var/log Disk Usage;OK;HARD;1;DISK OK - free space: /var/log 4533 MB (88.72% inode=100%):
Service Recovery 7/21/2019 0:31 SERVICE ALERT: HOSTSERVER;/var Disk Usage;OK;HARD;1;DISK OK - free space: /var 7955 MB (77.76% inode=100%):
Service Recovery 7/21/2019 0:30 SERVICE ALERT: HOSTSERVER;CPU Stats;OK;HARD;1;CPU STATISTICS OK: user=21.91% system=18.57% iowait=0.05% idle=59.45%
Service Recovery 7/21/2019 0:30 SERVICE ALERT: HOSTSERVER;/srv/bit9 Disk Usage;OK;HARD;1;DISK OK - free space: /srv/bit9 1122 MB (73.84% inode=100%):
Service Recovery 7/21/2019 0:30 SERVICE ALERT: HOSTSERVER;/tmp Disk Usage;OK;HARD;1;DISK OK - free space: /tmp 4052 MB (99.18% inode=100%):
Service Recovery 7/21/2019 0:28 SERVICE ALERT: HOSTSERVER;/home Disk Usage;OK;HARD;1;DISK OK - free space: /home 1495 MB (73.37% inode=100%):
Service Recovery 7/21/2019 0:28 SERVICE ALERT: HOSTSERVER;Memory Usage;OK;HARD;1;OK - 20620 / 32009 MB (64%) Free Memory, Used: 11389 MB, Shared: 56 MB, Buffers + Cached: 907 MB
Service Recovery 7/21/2019 0:28 SERVICE ALERT: HOSTSERVER;/opt/tivoli Disk Usage;OK;HARD;1;DISK OK - free space: /opt/tivoli 252 MB (24.86% inode=100%):
Service Recovery 7/21/2019 0:28 SERVICE ALERT: HOSTSERVER;/opt Disk Usage;OK;HARD;1;DISK OK - free space: /opt 564 MB (94.55% inode=100%):
Service Critical 7/21/2019 0:27 SERVICE ALERT: HOSTSERVER;CPU Stats;CRITICAL;SOFT;7;CHECK_NRPE: Error - Could not connect to 10.191.8.231: Connection reset by peer
Service Critical 7/21/2019 0:24 SERVICE ALERT: HOSTSERVER;CPU Stats;CRITICAL;SOFT;6;CHECK_NRPE STATE CRITICAL: Socket timeout after 30 seconds.
Service Critical 7/21/2019 0:23 SERVICE ALERT: HOSTSERVER;Memory Usage;CRITICAL;SOFT;4;CHECK_NRPE STATE CRITICAL: Socket timeout after 30 seconds.
Service Critical 7/21/2019 0:20 SERVICE ALERT: HOSTSERVER;CPU Stats;CRITICAL;SOFT;5;CHECK_NRPE STATE CRITICAL: Socket timeout after 30 seconds.
Service Critical 7/21/2019 0:17 SERVICE ALERT: HOSTSERVER;Memory Usage;CRITICAL;SOFT;3;CHECK_NRPE STATE CRITICAL: Socket timeout after 30 seconds.
Service Critical 7/21/2019 0:17 SERVICE ALERT: HOSTSERVER;/opt/ctmagent Disk Usage;CRITICAL;HARD;5;CHECK_NRPE STATE CRITICAL: Socket timeout after 30 seconds.
Service Notification 7/21/2019 0:17 SERVICE NOTIFICATION: INCIDENTSYSTEM;HOSTSERVER;/opt/ctmagent Disk Usage;CRITICAL;xi_service_notification_handler;CHECK_NRPE STATE CRITICAL: Socket timeout after 30 seconds.
Service Critical 7/21/2019 0:17 SERVICE ALERT: HOSTSERVER;/ Disk Usage;CRITICAL;HARD;5;CHECK_NRPE STATE CRITICAL: Socket timeout after 30 seconds.
Service Notification 7/21/2019 0:17 SERVICE NOTIFICATION: INCIDENTSYSTEM;HOSTSERVER;/ Disk Usage;CRITICAL;xi_service_notification_handler;CHECK_NRPE STATE CRITICAL: Socket timeout after 30 seconds.
Service Critical 7/21/2019 0:17 SERVICE ALERT: HOSTSERVER;CPU Stats;CRITICAL;SOFT;4;CHECK_NRPE STATE CRITICAL: Socket timeout after 30 seconds.
Service Critical 7/21/2019 0:17 SERVICE ALERT: HOSTSERVER;/var/log Disk Usage;CRITICAL;HARD;5;CHECK_NRPE STATE CRITICAL: Socket timeout after 30 seconds.
Service Notification 7/21/2019 0:17 SERVICE NOTIFICATION: INCIDENTSYSTEM;HOSTSERVER;/var/log Disk Usage;CRITICAL;xi_service_notification_handler;CHECK_NRPE STATE CRITICAL: Socket timeout after 30 seconds.
Service Critical 7/21/2019 0:16 SERVICE ALERT: HOSTSERVER;/var Disk Usage;CRITICAL;HARD;5;CHECK_NRPE STATE CRITICAL: Socket timeout after 30 seconds.
Service Notification 7/21/2019 0:16 SERVICE NOTIFICATION: INCIDENTSYSTEM;HOSTSERVER;/var Disk Usage;CRITICAL;xi_service_notification_handler;CHECK_NRPE STATE CRITICAL: Socket timeout after 30 seconds.
Service Critical 7/21/2019 0:15 SERVICE ALERT: HOSTSERVER;/srv/bit9 Disk Usage;CRITICAL;HARD;5;CHECK_NRPE STATE CRITICAL: Socket timeout after 30 seconds.
Service Notification 7/21/2019 0:15 SERVICE NOTIFICATION: INCIDENTSYSTEM;HOSTSERVER;/srv/bit9 Disk Usage;CRITICAL;xi_service_notification_handler;CHECK_NRPE STATE CRITICAL: Socket timeout after 30 seconds.
Service Critical 7/21/2019 0:15 SERVICE ALERT: HOSTSERVER;/opt/ctmagent Disk Usage;CRITICAL;SOFT;4;CHECK_NRPE STATE CRITICAL: Socket timeout after 30 seconds.
Service Critical 7/21/2019 0:15 SERVICE ALERT: HOSTSERVER;/tmp Disk Usage;CRITICAL;HARD;5;CHECK_NRPE STATE CRITICAL: Socket timeout after 30 seconds.
Service Notification 7/21/2019 0:15 SERVICE NOTIFICATION: INCIDENTSYSTEM;HOSTSERVER;/tmp Disk Usage;CRITICAL;xi_service_notification_handler;CHECK_NRPE STATE CRITICAL: Socket timeout after 30 seconds.
Service Critical 7/21/2019 0:15 SERVICE ALERT: HOSTSERVER;/ Disk Usage;CRITICAL;SOFT;4;CHECK_NRPE STATE CRITICAL: Socket timeout after 30 seconds.
Service Critical 7/21/2019 0:14 SERVICE ALERT: HOSTSERVER;/var/log Disk Usage;CRITICAL;SOFT;4;CHECK_NRPE STATE CRITICAL: Socket timeout after 30 seconds.
Service Critical 7/21/2019 0:14 SERVICE ALERT: HOSTSERVER;/var Disk Usage;CRITICAL;SOFT;4;CHECK_NRPE STATE CRITICAL: Socket timeout after 30 seconds.
Service Critical 7/21/2019 0:14 SERVICE ALERT: HOSTSERVER;CPU Stats;CRITICAL;SOFT;3;CHECK_NRPE STATE CRITICAL: Socket timeout after 30 seconds.
Service Critical 7/21/2019 0:14 SERVICE ALERT: HOSTSERVER;/home Disk Usage;CRITICAL;HARD;5;CHECK_NRPE STATE CRITICAL: Socket timeout after 30 seconds.
Service Notification 7/21/2019 0:14 SERVICE NOTIFICATION: INCIDENTSYSTEM;HOSTSERVER;/home Disk Usage;CRITICAL;xi_service_notification_handler;CHECK_NRPE STATE CRITICAL: Socket timeout after 30 seconds.
Service Critical 7/21/2019 0:13 SERVICE ALERT: HOSTSERVER;/opt/tivoli Disk Usage;CRITICAL;HARD;5;CHECK_NRPE STATE CRITICAL: Socket timeout after 30 seconds.
Service Notification 7/21/2019 0:13 SERVICE NOTIFICATION: INCIDENTSYSTEM;HOSTSERVER;/opt/tivoli Disk Usage;CRITICAL;xi_service_notification_handler;CHECK_NRPE STATE CRITICAL: Socket timeout after 30 seconds.
Service Critical 7/21/2019 0:13 SERVICE ALERT: HOSTSERVER;/srv/bit9 Disk Usage;CRITICAL;SOFT;4;CHECK_NRPE STATE CRITICAL: Socket timeout after 30 seconds.
Service Critical 7/21/2019 0:13 SERVICE ALERT: HOSTSERVER;/opt Disk Usage;CRITICAL;HARD;5;CHECK_NRPE STATE CRITICAL: Socket timeout after 30 seconds.
Service Notification 7/21/2019 0:13 SERVICE NOTIFICATION: INCIDENTSYSTEM;HOSTSERVER;/opt Disk Usage;CRITICAL;xi_service_notification_handler;CHECK_NRPE STATE CRITICAL: Socket timeout after 30 seconds.
Service Critical 7/21/2019 0:12 SERVICE ALERT: HOSTSERVER;/opt/ctmagent Disk Usage;CRITICAL;SOFT;3;CHECK_NRPE STATE CRITICAL: Socket timeout after 30 seconds.
Service Critical 7/21/2019 0:12 SERVICE ALERT: HOSTSERVER;/tmp Disk Usage;CRITICAL;SOFT;4;CHECK_NRPE STATE CRITICAL: Socket timeout after 30 seconds.
Service Critical 7/21/2019 0:12 SERVICE ALERT: HOSTSERVER;/ Disk Usage;CRITICAL;SOFT;3;CHECK_NRPE STATE CRITICAL: Socket timeout after 30 seconds.
Service Critical 7/21/2019 0:12 SERVICE ALERT: HOSTSERVER;/var/log Disk Usage;CRITICAL;SOFT;3;CHECK_NRPE STATE CRITICAL: Socket timeout after 30 seconds.
Service Critical 7/21/2019 0:12 SERVICE ALERT: HOSTSERVER;Memory Usage;CRITICAL;SOFT;2;CHECK_NRPE STATE CRITICAL: Socket timeout after 30 seconds.
Service Critical 7/21/2019 0:11 SERVICE ALERT: HOSTSERVER;/var Disk Usage;CRITICAL;SOFT;3;CHECK_NRPE STATE CRITICAL: Socket timeout after 30 seconds.
Service Critical 7/21/2019 0:11 SERVICE ALERT: HOSTSERVER;/home Disk Usage;CRITICAL;SOFT;4;CHECK_NRPE STATE CRITICAL: Socket timeout after 30 seconds.
Service Critical 7/21/2019 0:10 SERVICE ALERT: HOSTSERVER;/opt/tivoli Disk Usage;CRITICAL;SOFT;4;CHECK_NRPE STATE CRITICAL: Socket timeout after 30 seconds.
Service Critical 7/21/2019 0:10 SERVICE ALERT: HOSTSERVER;/srv/bit9 Disk Usage;CRITICAL;SOFT;3;CHECK_NRPE STATE CRITICAL: Socket timeout after 30 seconds.
Service Critical 7/21/2019 0:10 SERVICE ALERT: HOSTSERVER;CPU Stats;CRITICAL;SOFT;2;CHECK_NRPE STATE CRITICAL: Socket timeout after 30 seconds.
Service Critical 7/21/2019 0:10 SERVICE ALERT: HOSTSERVER;/opt Disk Usage;CRITICAL;SOFT;4;CHECK_NRPE STATE CRITICAL: Socket timeout after 30 seconds.
Service Critical 7/21/2019 0:10 SERVICE ALERT: HOSTSERVER;/opt/ctmagent Disk Usage;CRITICAL;SOFT;2;CHECK_NRPE STATE CRITICAL: Socket timeout after 30 seconds.
Service Critical 7/21/2019 0:10 SERVICE ALERT: HOSTSERVER;/tmp Disk Usage;CRITICAL;SOFT;3;CHECK_NRPE STATE CRITICAL: Socket timeout after 30 seconds.
Service Critical 7/21/2019 0:10 SERVICE ALERT: HOSTSERVER;/ Disk Usage;CRITICAL;SOFT;2;CHECK_NRPE STATE CRITICAL: Socket timeout after 30 seconds.
Service Critical 7/21/2019 0:10 SERVICE ALERT: HOSTSERVER;/var/log Disk Usage;CRITICAL;SOFT;2;CHECK_NRPE STATE CRITICAL: Socket timeout after 30 seconds.
Service Critical 7/21/2019 0:09 SERVICE ALERT: HOSTSERVER;/var Disk Usage;CRITICAL;SOFT;2;CHECK_NRPE STATE CRITICAL: Socket timeout after 30 seconds.
Service Critical 7/21/2019 0:09 SERVICE ALERT: HOSTSERVER;/home Disk Usage;CRITICAL;SOFT;3;CHECK_NRPE STATE CRITICAL: Socket timeout after 30 seconds.
Service Critical 7/21/2019 0:08 SERVICE ALERT: HOSTSERVER;/opt/tivoli Disk Usage;CRITICAL;SOFT;3;CHECK_NRPE STATE CRITICAL: Socket timeout after 30 seconds.
Service Critical 7/21/2019 0:08 SERVICE ALERT: HOSTSERVER;/srv/bit9 Disk Usage;CRITICAL;SOFT;2;CHECK_NRPE STATE CRITICAL: Socket timeout after 30 seconds.
Service Critical 7/21/2019 0:08 SERVICE ALERT: HOSTSERVER;/opt Disk Usage;CRITICAL;SOFT;3;CHECK_NRPE STATE CRITICAL: Socket timeout after 30 seconds.
Service Critical 7/21/2019 0:07 SERVICE ALERT: HOSTSERVER;/opt/ctmagent Disk Usage;CRITICAL;SOFT;1;CHECK_NRPE STATE CRITICAL: Socket timeout after 30 seconds.
Service Critical 7/21/2019 0:07 SERVICE ALERT: HOSTSERVER;/tmp Disk Usage;CRITICAL;SOFT;2;CHECK_NRPE STATE CRITICAL: Socket timeout after 30 seconds.
Service Critical 7/21/2019 0:07 SERVICE ALERT: HOSTSERVER;/ Disk Usage;CRITICAL;SOFT;1;CHECK_NRPE STATE CRITICAL: Socket timeout after 30 seconds.
Service Critical 7/21/2019 0:07 SERVICE ALERT: HOSTSERVER;/var/log Disk Usage;CRITICAL;SOFT;1;CHECK_NRPE STATE CRITICAL: Socket timeout after 30 seconds.
Service Critical 7/21/2019 0:07 SERVICE ALERT: HOSTSERVER;CPU Stats;CRITICAL;SOFT;1;CHECK_NRPE STATE CRITICAL: Socket timeout after 30 seconds.
Service Critical 7/21/2019 0:07 SERVICE ALERT: HOSTSERVER;Memory Usage;CRITICAL;SOFT;1;CHECK_NRPE STATE CRITICAL: Socket timeout after 30 seconds.
Service Critical 7/21/2019 0:06 SERVICE ALERT: HOSTSERVER;/var Disk Usage;CRITICAL;SOFT;1;CHECK_NRPE STATE CRITICAL: Socket timeout after 30 seconds.
Service Critical 7/21/2019 0:06 SERVICE ALERT: HOSTSERVER;/home Disk Usage;CRITICAL;SOFT;2;CHECK_NRPE STATE CRITICAL: Socket timeout after 30 seconds.
Service Critical 7/21/2019 0:05 SERVICE ALERT: HOSTSERVER;/opt/tivoli Disk Usage;CRITICAL;SOFT;2;CHECK_NRPE STATE CRITICAL: Socket timeout after 30 seconds.
Service Critical 7/21/2019 0:05 SERVICE ALERT: HOSTSERVER;/srv/bit9 Disk Usage;CRITICAL;SOFT;1;CHECK_NRPE STATE CRITICAL: Socket timeout after 30 seconds.
Service Critical 7/21/2019 0:05 SERVICE ALERT: HOSTSERVER;/opt Disk Usage;CRITICAL;SOFT;2;CHECK_NRPE STATE CRITICAL: Socket timeout after 30 seconds.
Service Critical 7/21/2019 0:05 SERVICE ALERT: HOSTSERVER;/tmp Disk Usage;CRITICAL;SOFT;1;CHECK_NRPE STATE CRITICAL: Socket timeout after 30 seconds.
Service Critical 7/21/2019 0:04 SERVICE ALERT: HOSTSERVER;/home Disk Usage;CRITICAL;SOFT;1;CHECK_NRPE STATE CRITICAL: Socket timeout after 30 seconds.
Service Critical 7/21/2019 0:03 SERVICE ALERT: HOSTSERVER;/opt/tivoli Disk Usage;CRITICAL;SOFT;1;CHECK_NRPE STATE CRITICAL: Socket timeout after 30 seconds.
Service Critical 7/21/2019 0:03 SERVICE ALERT: HOSTSERVER;/opt Disk Usage;CRITICAL;SOFT;1;CHECK_NRPE STATE CRITICAL: Socket timeout after 30 seconds.
Information 7/21/2019 0:00 CURRENT SERVICE STATE: HOSTSERVER;Ping;OK;HARD;1;OK - 10.191.8.231: rta 0.160ms, lost 0%
Information 7/21/2019 0:00 CURRENT SERVICE STATE: HOSTSERVER;Memory Usage;OK;HARD;1;OK - 9594 / 32009 MB (29%) Free Memory, Used: 22414 MB, Shared: 48 MB, Buffers + Cached: 9115 MB
Information 7/21/2019 0:00 CURRENT SERVICE STATE: HOSTSERVER;CPU Stats;OK;HARD;1;CPU STATISTICS OK: user=14.34% system=28.95% iowait=0.05% idle=56.66%
Information 7/21/2019 0:00 CURRENT SERVICE STATE: HOSTSERVER;/var/log Disk Usage;OK;HARD;1;DISK OK - free space: /var/log 4538 MB (88.81% inode=100%):
Information 7/21/2019 0:00 CURRENT SERVICE STATE: HOSTSERVER;/var Disk Usage;OK;HARD;1;DISK OK - free space: /var 7950 MB (77.71% inode=100%):
Information 7/21/2019 0:00 CURRENT SERVICE STATE: HOSTSERVER;/tmp Disk Usage;OK;HARD;1;DISK OK - free space: /tmp 4052 MB (99.18% inode=100%):
Information 7/21/2019 0:00 CURRENT SERVICE STATE: HOSTSERVER;/srv/bit9 Disk Usage;OK;HARD;1;DISK OK - free space: /srv/bit9 954 MB (62.78% inode=100%):
Information 7/21/2019 0:00 CURRENT SERVICE STATE: HOSTSERVER;/opt/tivoli Disk Usage;OK;HARD;1;DISK OK - free space: /opt/tivoli 252 MB (24.86% inode=100%):
Information 7/21/2019 0:00 CURRENT SERVICE STATE: HOSTSERVER;/opt/ctmagent Disk Usage;OK;HARD;1;DISK OK - free space: /opt/ctmagent 671 MB (66.23% inode=100%):
Information 7/21/2019 0:00 CURRENT SERVICE STATE: HOSTSERVER;/opt Disk Usage;OK;HARD;1;DISK OK - free space: /opt 564 MB (94.55% inode=100%):
Information 7/21/2019 0:00 CURRENT SERVICE STATE: HOSTSERVER;/home Disk Usage;OK;HARD;1;DISK OK - free space: /home 1495 MB (73.37% inode=100%):
Information 7/21/2019 0:00 CURRENT SERVICE STATE: HOSTSERVER;/ Disk Usage;OK;HARD;1;DISK OK - free space: / 7364 MB (71.98% inode=100%):
Information 7/21/2019 0:00 CURRENT HOST STATE: HOSTSERVER;UP;HARD;1;OK - 10.191.8.231: rta 0.178ms, lost 0%

Re: How do I not alert on Socket timeout after 30 seconds.

Posted: Mon Jul 22, 2019 9:17 am
by tgriep
The socket timeouts are generated when the check_nrpe plugin cannot connect to the NRPE agent on the remote host.
If the remote host is down, or that the agent is not running, or a firewall, network connection issue, the plugin will return that message.

The notifications came from a disk check that failed on the server. Take a look at the remote server to see if there were any disk errors that could of caused the issue.
Another thing to try, I would guess that the command that is ran on the remote server is using the check_disk plugin.
Try increasing the timeout for the plugin by adding a -t 30 in the arguments that are sent to the remote server NRPE agent.

Re: How do I not alert on Socket timeout after 30 seconds.

Posted: Thu Jul 25, 2019 9:48 am
by mkeey
We already changed the default timeout from 20 seconds to 30 seconds. Don't want to go much more than that as we have monitors that retry every 60 seconds.

One other thing we found was that a Socket Timeout sends a notification after a single failure. All of our hosts and services are coded to alert after 5 or more failures. Why does a socket timeout ignore that criteria?



NOTIFICATION: servicenow_prd;REMOTEHOST;CPU Usage;CRITICAL;xi_service_notification_handler;CPU Load 95% (5 min average)
Service Notification,7/24/2019 23:43,SERVICE NOTIFICATION: supportteam;REMOTEHOST;CPU Usage;CRITICAL;xi_service_notification_handler;CPU Load 95% (5 min average)
Service Critical,7/24/2019 23:40,SERVICE ALERT: REMOTEHOST;CPU Usage;CRITICAL;HARD;1;CRITICAL - Socket timeout

Service Recovery,7/24/2019 21:26,SERVICE ALERT: REMOTEHOST;CPU Usage;OK;HARD;1;CPU Load 86% (5 min average)
Service Warning,7/24/2019 21:23,SERVICE ALERT: REMOTEHOST;CPU Usage;WARNING;SOFT;1;CPU Load 92% (5 min average)

Service Recovery,7/24/2019 21:18,SERVICE ALERT: REMOTEHOST;CPU Usage;OK;HARD;1;CPU Load 84% (5 min average)
Service Critical,7/24/2019 21:15,SERVICE ALERT: REMOTEHOST;CPU Usage;CRITICAL;SOFT;1;CPU Load 97% (5 min average)

Service Recovery,7/24/2019 21:05,SERVICE ALERT: REMOTEHOST;CPU Usage;OK;HARD;1;CPU Load 75% (5 min average)
Service Warning,7/24/2019 21:02,SERVICE ALERT: REMOTEHOST;CPU Usage;WARNING;SOFT;1;CPU Load 93% (5 min average)