Flapping issue...

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
PhilG
Posts: 286
Joined: Thu Jan 16, 2014 10:24 am

Flapping issue...

Post by PhilG »

Hello:
We are using Nagios XI 2014R2.6, and a Host, which was starting to report Critical Status Information in the past couple of weeks, was running NSClient++ 0.3.9 64 bit. I installed the "newer" 0.4.1.105 64 bit version at this time to see if that would clear the issue.
Note: I confirmed with my colleagues, who don't work with the server, that they have not upgraded/made any changes to that Host/server. I have not done any work on that Host/server until now when I installed the newer NSClient++.

The Memory Usage, the CPU Usage, the Drive C: Disk Usage, and the IIS Web Server checks appear to flap and give the following Status Information message:
“CRITICAL - Socket timeout after 10 seconds”.
The Critical clears after a bit, with the following:
1). Memory Usage - “Memory usage: total:32592.11 MB - used: 2519.97 MB (8%) - free: 30072.13 MB (92%)” <-- which is within the threshold checks of warn on 90% and go critical on 95%
2). IIS Web Server - “W3SVC: Started” (the Windows server web service)
3). CPU Usage - “CPU Load 0% (5 min average)”. <-- I'm sure that this was a particular point in time reference when I looked.
4). Drive C: Disk Usage - C:\ - total: 68.33 Gb - used: 51.48 Gb (75%) - free 16.85 Gb (25%) <-- which is within the threshold checks of warn on 90% and go critical on 95%.

This doesn’t happen with the other Hosts and Services that are set up and reporting with Nagios XI (just over 300 Windows and Linux servers/Hosts) and over 1300 total Services being monitored).

Suggestions??

Thank you in advance.


UPDATE: These are the checks used:
Drive C: Disk Usage:
https://<my_web_site_URL>/nagiosxi/includes/components/xicore/status.php?show=servicedetail&host=<my_Server_FQDN>&service=Drive+C%3A+Disk+Usage&dest=auto
Status Information: "CRITICAL - Socket timeout after 10 seconds"
check_xi_service_nsclient!<edited>!USEDDISKSPACE!-l C -w 90 -c 95


IIS Web Server:
https://<my_web_site_URL>/nagiosxi/includes/components/xicore/status.php?show=servicedetail&host=<my_Server_FQDN>&service=IIS+Web+Server&dest=auto
Status Information: "CRITICAL - Socket timeout after 10 seconds"
check_xi_service_nsclient!<edited>!SERVICESTATE!-l W3SVC -d SHOWALL


Memory Usage:
https://<my_web_site_URL>/nagiosxi/includes/components/xicore/status.php?show=servicedetail&host=<my_Server_FQDN>&service=Memory+Usage&dest=auto
Status Information: "could not fetch information from server"
check_xi_service_nsclient!<edited>!MEMUSE!-w 90 -c 95
Newbie '14
User avatar
lmiltchev
Bugs find me
Posts: 13589
Joined: Mon May 23, 2011 12:15 pm

Re: Flapping issue...

Post by lmiltchev »

Let's rule out firewall issues. Run the following command and show us the output:

Code: Select all

nmap <client ip> -p 12489
Also, show the output of:

Code: Select all

time /usr/local/nagios/libexec/check_nt -H <client ip> -s <password> -p 12489 -v USEDDISKSPACE -l C -w 90 -c 95
Be sure to check out our Knowledgebase for helpful articles and solutions!
PhilG
Posts: 286
Joined: Thu Jan 16, 2014 10:24 am

Re: Flapping issue...

Post by PhilG »

Sorry for the delay.

Here are the results of your requests:
1). Response from "nmap <client ip> -p 12489":

Starting Nmap 5.51 ( http://nmap.org ) at 2015-04-30 16:19 CDT
Nmap scan report for <Server FQDN> (Server_IP)
Host is up (0.00067s latency).
PORT STATE SERVICE
12489/tcp open unknown

Nmap done: 1 IP address (1 host up) scanned in 0.09 seconds


2). Response from "time /usr/local/nagios/libexec/check_nt -H <client ip> -s <password> -p 12489 -v USEDDISKSPACE -l C -w 90 -c 95":

C:\ - total: 68.33 Gb - used: 51.49 Gb (75%) - free 16.84 Gb (25%) | 'C:\ Used Space'=51.49Gb;61.50;64.91;0.00;68.33

real 0m0.004s
user 0m0.002s
sys 0m0.001s
Newbie '14
jdalrymple
Skynet Drone
Posts: 2620
Joined: Wed Feb 11, 2015 1:56 pm

Re: Flapping issue...

Post by jdalrymple »

It sounds like a funky network issue - like maybe some arp poisoning (multiple machines using 1 IP) or some such - that almost can't be the case though if you're not getting host alerts. What is your host check command and interval?
PhilG
Posts: 286
Joined: Thu Jan 16, 2014 10:24 am

Re: Flapping issue...

Post by PhilG »

jdalrymple wrote:It sounds like a funky network issue - like maybe some arp poisoning (multiple machines using 1 IP) or some such - that almost can't be the case though if you're not getting host alerts. What is your host check command and interval?
"Monitor the host with this command" - this is not configured/has a blank field. Same goes with many of my other Hosts, but those work with no issues.
Newbie '14
jdalrymple
Skynet Drone
Posts: 2620
Joined: Wed Feb 11, 2015 1:56 pm

Re: Flapping issue...

Post by jdalrymple »

PhilG wrote:"Monitor the host with this command" - this is not configured/has a blank field.
Inherited from template no doubt. I suggest you read up on templates and inheritance if you care to understand how that works.
PhilG wrote:Same goes with many of my other Hosts, but those work with no issues.
Are you saying that you are getting host alerts for the hosts with problematic services?
PhilG
Posts: 286
Joined: Thu Jan 16, 2014 10:24 am

Re: Flapping issue...

Post by PhilG »

jdalrymple wrote:
PhilG wrote:"Monitor the host with this command" - this is not configured/has a blank field.
Are you saying that you are getting host alerts for the hosts with problematic services?
Well, it is a Microsoft Windows IIS server. Need I say anymore? ;)
Newbie '14
jdalrymple
Skynet Drone
Posts: 2620
Joined: Wed Feb 11, 2015 1:56 pm

Re: Flapping issue...

Post by jdalrymple »

I'm sorry, I'm not quite following. Does that mean that you are indeed also getting host alerts in addition to your service alerts?
PhilG
Posts: 286
Joined: Thu Jan 16, 2014 10:24 am

Re: Flapping issue...

Post by PhilG »

jdalrymple wrote:I'm sorry, I'm not quite following. Does that mean that you are indeed also getting host alerts in addition to your service alerts?

Host is fine. I find it odd that this is the only Host that is getting
" CRITICAL - Socket timeout after 10 seconds"
on CPU Usage, C: disk partition, IIS Web server, and whatever else is monitors. Sometimes it's all services, sometimes its not.
Newbie '14
jdalrymple
Skynet Drone
Posts: 2620
Joined: Wed Feb 11, 2015 1:56 pm

Re: Flapping issue...

Post by jdalrymple »

Based upon what you've said the only suggestions I would have are to either revert to the older nsclient++ version (or maybe upgrade to a newer one), or get over to http://forums.nsclient.org and see if there are any suggestions by the developer.

The fact that there are no host alerts almost entirely narrows the problem down to the scope of the nsclient service, unless of course you've fiddled with your host command. Since you weren't aware that it was coming down from a template I'd guess that's almost gotta be a "no".
Locked