Our datacenter lost connectivity for ~40mins this afternoon. We were greeted with a mountain of notifications when the firewall came back up, of course, but since connectivity has been restored, each of the instances monitored in AWS via IPSec tunnel have reported all failed services.
- host reports "up" in Nagios
- client can ping server
- server can ping client
- ncpa_listener/_passive are both running on the clients
The only way for us to connect to these instances is via VPN to our datacenter, & then through the IPSec tunnel. The tunnel reports up, and indeed, I can ssh to each of the affected clients without issue.
Code: Select all
[root@nagios libexec]# ./check_ncpa.py -H <clientIP> -t <token> -M 'processes'
UNKNOWN: Execution exceeded timeout threshold of 60s
Any help would be greatly appreciated.