Limit to number of services in a service group?

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
nico
Posts: 2
Joined: Thu Mar 22, 2012 6:31 am

Limit to number of services in a service group?

Post by nico »

Hello,

My first post here on the forum, so I'll start with a nice problem that I have been tripping over.

I run a small nagios installation with a bit more then 15000 services and more the 400 hosts, I have been running nagios on our systems for more the 10 years now.

Now to the problem. I have a one service group with 2893 services attached to it, recently all services in this group started generating strange errors, like this (I have masked out some sensitive information):
Mar 22 06:39:30 XXX nagios3: Warning: Unable to move file '/var/lib/nagios3/spool/checkresults/checkRLYSDN' to check results queue.
Mar 22 06:39:33 XXX nagios3: Error: Unable to rename file '/var/lib/nagios3/spool/checkresults/checkzInUQ2' to '/var/lib/nagios3/spool/checkresults/c485Wid': No such file or directory

and this:
Mar 22 06:48:50 XXX nagios3: Warning: Attempting to execute the command "/usr/bin/printf "%b" "***** Nagios *****\n\nNotification Type: PROBLEM\n\nService:XXXXX\nHost: XXXXXX\nState: UNKNOWN for 0d 0h 0m 11s\nAddress: XXXX\n\nInfo:\n\nXXXXXX\n\nDate/Time: Thu Mar 22 06:48:50 CET 2012\n\nACK by: \nComment: \n" | /usr/bin/mail -s "PROBLEM -XXXX: XXXX is UNKNOWN" [email protected]" resulted in a return code of 127. Make sure the script or binary you are trying to execute actually exists...

The same notification commands and service checks run fine on services which are not member of this large service group. Also if I remove the member association to the service group the problem goes away. This started occurring without any changes other then adding new services to the service group. This makes it look like some sort of limit in either nagios or the OS (debian/linux).

Any suggestions or hints are welcome, thanks!

/Nico
User avatar
jsmurphy
Posts: 989
Joined: Wed Aug 18, 2010 9:46 pm

Re: Limit to number of services in a service group?

Post by jsmurphy »

I can't say that I've ever seen behaviour like that... but I've also never tried to put 3000 services in a single service group before. What version are you running? What's the load on the server like? I've seen similar errors when I had the misfortune of encountering high disk latency... so maybe those are some places to start.

It might also be worth asking on the nagios-devel mailing list they would know if there's any kind of soft limit for what nagios could handle.
nico
Posts: 2
Joined: Thu Mar 22, 2012 6:31 am

Re: Limit to number of services in a service group?

Post by nico »

I'm running version 3.2.3 and the load on the server is ~0.20, check latency and execution times are typically averaging under 0.5s. It's running on a dual quad-core server with 24GB RAM and an areca hw-raidcontroller.
User avatar
jsmurphy
Posts: 989
Joined: Wed Aug 18, 2010 9:46 pm

Re: Limit to number of services in a service group?

Post by jsmurphy »

Yeah, barely breaking a sweat. As I said the only time I encountered a similar issue was when my Disk IO/latency was through the roof but I don't think that's in anyway related to your problem, you might need to ask on the Nagios-devel mailing list for this one I'm afraid.
Locked