(Return code of 255 is out of bounds)

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
jkinning
Posts: 747
Joined: Wed Oct 09, 2013 2:54 pm

Re: (Return code of 255 is out of bounds)

Post by jkinning »

Xi is actively checking everything with 5 minute check intervals.
User avatar
lmiltchev
Bugs find me
Posts: 13589
Joined: Mon May 23, 2011 12:15 pm

Re: (Return code of 255 is out of bounds)

Post by lmiltchev »

Can you show one of the "failing" checks run from the command line along with the output of it? Also, post the nsclient.ini file. Hide sensitive info.
Be sure to check out our Knowledgebase for helpful articles and solutions!
jkinning
Posts: 747
Joined: Wed Oct 09, 2013 2:54 pm

Re: (Return code of 255 is out of bounds)

Post by jkinning »

Code: Select all

./check_nrpe -H blockmasterl1t -t 30 -c check_load
CHECK_NRPE: Socket timeout after 30 seconds.
I am running nrpe under xinetd on Linux. I am experiencing similar issues with Windows using NSClient++.
You do not have the required permissions to view the files attached to this post.
User avatar
hsmith
Agent Smith
Posts: 3539
Joined: Thu Jul 30, 2015 11:09 am
Location: 127.0.0.1
Contact:

Re: (Return code of 255 is out of bounds)

Post by hsmith »

What is the output of the following commands?

Code: Select all

free -m
top | head -n5
df -h 
df -ih 
lscpu
Former Nagios Employee.
me.
jkinning
Posts: 747
Joined: Wed Oct 09, 2013 2:54 pm

Re: (Return code of 255 is out of bounds)

Post by jkinning »

free -m
total used free shared buffers cached
Mem: 32440 28568 3872 34 205 19001
-/+ buffers/cache: 9361 23079
Swap: 2015 78 1937

top | head -n5
top - 15:57:44 up 50 days, 3:38, 1 user, load average: 1.45, 1.45, 1.64
Tasks: 473 total, 1 running, 472 sleeping, 0 stopped, 0 zombie
Cpu(s): 3.7%us, 1.7%sy, 0.0%ni, 93.8%id, 0.6%wa, 0.0%hi, 0.2%si, 0.0%st
Mem: 33219220k total, 29240444k used, 3978776k free, 210264k buffers
Swap: 2064380k total, 80196k used, 1984184k free, 19457568k cached

df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/VolGroup-lv_root
283G 30G 239G 12% /
tmpfs 3.9G 0 3.9G 0% /dev/shm
/dev/sda1 477M 140M 312M 32% /boot
/dev/sr0 62M 62M 0 100% /mnt

df -ih
Filesystem Inodes IUsed IFree IUse% Mounted on
/dev/mapper/VolGroup-lv_root
18M 151K 18M 1% /
tmpfs 984K 1 984K 1% /dev/shm
/dev/sda1 126K 62 125K 1% /boot
/dev/sr0 0 0 0 - /mnt

lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 16
On-line CPU(s) list: 0-15
Thread(s) per core: 1
Core(s) per socket: 1
Socket(s): 16
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 45
Stepping: 7
CPU MHz: 2200.000
BogoMIPS: 4400.00
Hypervisor vendor: VMware
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 20480K
NUMA node0 CPU(s): 0-15
NUMA node1 CPU(s):
User avatar
hsmith
Agent Smith
Posts: 3539
Joined: Thu Jul 30, 2015 11:09 am
Location: 127.0.0.1
Contact:

Re: (Return code of 255 is out of bounds)

Post by hsmith »

For kicks, can we try to scale back the number of CPUs you have inside of this VM?

Here is a neat little article describing why I think this could be part of the issue.

http://www.gabesvirtualworld.com/how-to ... rformance/

I think it may be time to consider offloading the DB if you're having performance issues as well.

https://assets.nagios.com/downloads/nag ... Server.pdf
Former Nagios Employee.
me.
jkinning
Posts: 747
Joined: Wed Oct 09, 2013 2:54 pm

Re: (Return code of 255 is out of bounds)

Post by jkinning »

What is the max or recommendation for host and service counts for a single Nagios XI instance? From what I have seen when I skimmed the support forums there are users that have much larger environments and a single Nagios XI server works. Is there a way to check and see if I have this setup correctly? Is there something that would stand out to help me diagnose the 255 timeouts I am seeing? It isn't just 1 or 2 clients and varies. One notification I get may be from HostA but later HostA is fine but I get it the notification from HostB, etc.

Not sure how I can tell if this is a Nagios XI issue or a network issue or a VM issue. I've increased my template checks from 5 minutes to 6 and that appears to lower the count but I am still getting them. My next step was going to be try 7 minutes to see if that helps even more. Is there a good rule of thumb for checks? Maybe make non-prod 10 minute checks and prod the 5 minutes.
User avatar
Box293
Too Basu
Posts: 5126
Joined: Sun Feb 07, 2010 10:55 pm
Location: Deniliquin, Australia
Contact:

Re: (Return code of 255 is out of bounds)

Post by Box293 »

About 20,000 checks is when Nagios starts to struggle.

Have you tried increasing your timeout for NRPE?

In CCM change the check_nrpe command -t 30 to something like -t 55

Here's a presentation on Nagios XI Best Practices, it has some recommendations which might help:
https://www.youtube.com/watch?v=6WlZrG-_sAI
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Locked