Flapping issue...
Posted: Mon Apr 27, 2015 11:47 am
Hello:
We are using Nagios XI 2014R2.6, and a Host, which was starting to report Critical Status Information in the past couple of weeks, was running NSClient++ 0.3.9 64 bit. I installed the "newer" 0.4.1.105 64 bit version at this time to see if that would clear the issue.
Note: I confirmed with my colleagues, who don't work with the server, that they have not upgraded/made any changes to that Host/server. I have not done any work on that Host/server until now when I installed the newer NSClient++.
The Memory Usage, the CPU Usage, the Drive C: Disk Usage, and the IIS Web Server checks appear to flap and give the following Status Information message:
“CRITICAL - Socket timeout after 10 seconds”.
The Critical clears after a bit, with the following:
1). Memory Usage - “Memory usage: total:32592.11 MB - used: 2519.97 MB (8%) - free: 30072.13 MB (92%)” <-- which is within the threshold checks of warn on 90% and go critical on 95%
2). IIS Web Server - “W3SVC: Started” (the Windows server web service)
3). CPU Usage - “CPU Load 0% (5 min average)”. <-- I'm sure that this was a particular point in time reference when I looked.
4). Drive C: Disk Usage - C:\ - total: 68.33 Gb - used: 51.48 Gb (75%) - free 16.85 Gb (25%) <-- which is within the threshold checks of warn on 90% and go critical on 95%.
This doesn’t happen with the other Hosts and Services that are set up and reporting with Nagios XI (just over 300 Windows and Linux servers/Hosts) and over 1300 total Services being monitored).
Suggestions??
Thank you in advance.
UPDATE: These are the checks used:
Drive C: Disk Usage:
https://<my_web_site_URL>/nagiosxi/includes/components/xicore/status.php?show=servicedetail&host=<my_Server_FQDN>&service=Drive+C%3A+Disk+Usage&dest=auto
Status Information: "CRITICAL - Socket timeout after 10 seconds"
check_xi_service_nsclient!<edited>!USEDDISKSPACE!-l C -w 90 -c 95
IIS Web Server:
https://<my_web_site_URL>/nagiosxi/includes/components/xicore/status.php?show=servicedetail&host=<my_Server_FQDN>&service=IIS+Web+Server&dest=auto
Status Information: "CRITICAL - Socket timeout after 10 seconds"
check_xi_service_nsclient!<edited>!SERVICESTATE!-l W3SVC -d SHOWALL
Memory Usage:
https://<my_web_site_URL>/nagiosxi/includes/components/xicore/status.php?show=servicedetail&host=<my_Server_FQDN>&service=Memory+Usage&dest=auto
Status Information: "could not fetch information from server"
check_xi_service_nsclient!<edited>!MEMUSE!-w 90 -c 95
We are using Nagios XI 2014R2.6, and a Host, which was starting to report Critical Status Information in the past couple of weeks, was running NSClient++ 0.3.9 64 bit. I installed the "newer" 0.4.1.105 64 bit version at this time to see if that would clear the issue.
Note: I confirmed with my colleagues, who don't work with the server, that they have not upgraded/made any changes to that Host/server. I have not done any work on that Host/server until now when I installed the newer NSClient++.
The Memory Usage, the CPU Usage, the Drive C: Disk Usage, and the IIS Web Server checks appear to flap and give the following Status Information message:
“CRITICAL - Socket timeout after 10 seconds”.
The Critical clears after a bit, with the following:
1). Memory Usage - “Memory usage: total:32592.11 MB - used: 2519.97 MB (8%) - free: 30072.13 MB (92%)” <-- which is within the threshold checks of warn on 90% and go critical on 95%
2). IIS Web Server - “W3SVC: Started” (the Windows server web service)
3). CPU Usage - “CPU Load 0% (5 min average)”. <-- I'm sure that this was a particular point in time reference when I looked.
4). Drive C: Disk Usage - C:\ - total: 68.33 Gb - used: 51.48 Gb (75%) - free 16.85 Gb (25%) <-- which is within the threshold checks of warn on 90% and go critical on 95%.
This doesn’t happen with the other Hosts and Services that are set up and reporting with Nagios XI (just over 300 Windows and Linux servers/Hosts) and over 1300 total Services being monitored).
Suggestions??
Thank you in advance.
UPDATE: These are the checks used:
Drive C: Disk Usage:
https://<my_web_site_URL>/nagiosxi/includes/components/xicore/status.php?show=servicedetail&host=<my_Server_FQDN>&service=Drive+C%3A+Disk+Usage&dest=auto
Status Information: "CRITICAL - Socket timeout after 10 seconds"
check_xi_service_nsclient!<edited>!USEDDISKSPACE!-l C -w 90 -c 95
IIS Web Server:
https://<my_web_site_URL>/nagiosxi/includes/components/xicore/status.php?show=servicedetail&host=<my_Server_FQDN>&service=IIS+Web+Server&dest=auto
Status Information: "CRITICAL - Socket timeout after 10 seconds"
check_xi_service_nsclient!<edited>!SERVICESTATE!-l W3SVC -d SHOWALL
Memory Usage:
https://<my_web_site_URL>/nagiosxi/includes/components/xicore/status.php?show=servicedetail&host=<my_Server_FQDN>&service=Memory+Usage&dest=auto
Status Information: "could not fetch information from server"
check_xi_service_nsclient!<edited>!MEMUSE!-w 90 -c 95