Page 2 of 2
Re: (Return code of 255 is out of bounds)
Posted: Mon Nov 09, 2015 8:29 am
by jkinning
Xi is actively checking everything with 5 minute check intervals.
Re: (Return code of 255 is out of bounds)
Posted: Mon Nov 09, 2015 11:18 am
by lmiltchev
Can you show one of the "failing" checks run from the command line along with the output of it? Also, post the nsclient.ini file. Hide sensitive info.
Re: (Return code of 255 is out of bounds)
Posted: Mon Nov 09, 2015 1:11 pm
by jkinning
Code: Select all
./check_nrpe -H blockmasterl1t -t 30 -c check_load
CHECK_NRPE: Socket timeout after 30 seconds.
I am running nrpe under xinetd on Linux. I am experiencing similar issues with Windows using NSClient++.
Re: (Return code of 255 is out of bounds)
Posted: Mon Nov 09, 2015 2:54 pm
by hsmith
What is the output of the following commands?
Code: Select all
free -m
top | head -n5
df -h
df -ih
lscpu
Re: (Return code of 255 is out of bounds)
Posted: Mon Nov 09, 2015 3:59 pm
by jkinning
free -m
total used free shared buffers cached
Mem: 32440 28568 3872 34 205 19001
-/+ buffers/cache: 9361 23079
Swap: 2015 78 1937
top | head -n5
top - 15:57:44 up 50 days, 3:38, 1 user, load average: 1.45, 1.45, 1.64
Tasks: 473 total, 1 running, 472 sleeping, 0 stopped, 0 zombie
Cpu(s): 3.7%us, 1.7%sy, 0.0%ni, 93.8%id, 0.6%wa, 0.0%hi, 0.2%si, 0.0%st
Mem: 33219220k total, 29240444k used, 3978776k free, 210264k buffers
Swap: 2064380k total, 80196k used, 1984184k free, 19457568k cached
df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/VolGroup-lv_root
283G 30G 239G 12% /
tmpfs 3.9G 0 3.9G 0% /dev/shm
/dev/sda1 477M 140M 312M 32% /boot
/dev/sr0 62M 62M 0 100% /mnt
df -ih
Filesystem Inodes IUsed IFree IUse% Mounted on
/dev/mapper/VolGroup-lv_root
18M 151K 18M 1% /
tmpfs 984K 1 984K 1% /dev/shm
/dev/sda1 126K 62 125K 1% /boot
/dev/sr0 0 0 0 - /mnt
lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 16
On-line CPU(s) list: 0-15
Thread(s) per core: 1
Core(s) per socket: 1
Socket(s): 16
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 45
Stepping: 7
CPU MHz: 2200.000
BogoMIPS: 4400.00
Hypervisor vendor: VMware
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 20480K
NUMA node0 CPU(s): 0-15
NUMA node1 CPU(s):
Re: (Return code of 255 is out of bounds)
Posted: Mon Nov 09, 2015 4:20 pm
by hsmith
For kicks, can we try to scale back the number of CPUs you have inside of this VM?
Here is a neat little article describing why I think this could be part of the issue.
http://www.gabesvirtualworld.com/how-to ... rformance/
I think it may be time to consider offloading the DB if you're having performance issues as well.
https://assets.nagios.com/downloads/nag ... Server.pdf
Re: (Return code of 255 is out of bounds)
Posted: Tue Dec 15, 2015 5:57 pm
by jkinning
What is the max or recommendation for host and service counts for a single Nagios XI instance? From what I have seen when I skimmed the support forums there are users that have much larger environments and a single Nagios XI server works. Is there a way to check and see if I have this setup correctly? Is there something that would stand out to help me diagnose the 255 timeouts I am seeing? It isn't just 1 or 2 clients and varies. One notification I get may be from HostA but later HostA is fine but I get it the notification from HostB, etc.
Not sure how I can tell if this is a Nagios XI issue or a network issue or a VM issue. I've increased my template checks from 5 minutes to 6 and that appears to lower the count but I am still getting them. My next step was going to be try 7 minutes to see if that helps even more. Is there a good rule of thumb for checks? Maybe make non-prod 10 minute checks and prod the 5 minutes.
Re: (Return code of 255 is out of bounds)
Posted: Tue Dec 15, 2015 7:46 pm
by Box293
About 20,000 checks is when Nagios starts to struggle.
Have you tried increasing your timeout for NRPE?
In CCM change the check_nrpe command
-t 30 to something like
-t 55
Here's a presentation on Nagios XI Best Practices, it has some recommendations which might help:
https://www.youtube.com/watch?v=6WlZrG-_sAI