Issues with my initial Core Rollout - Help

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
a31modela
Posts: 18
Joined: Thu Jun 24, 2010 12:40 pm

Issues with my initial Core Rollout - Help

Post by a31modela »

I have started to roll out nagios Core in a distributed environment and have started seeing some major issues that is stopping the rollout.

Central Server - SuSE 11 VMWare server
Dual CPU
4gb RAM

Remote Servers are Dual CPU SuSE 10


I am currently out to 75 remote locations, each set up as a distributed server for 10 local suse devices. Each distributed server is running checks on the devices in his location by ssh to the device where the checks all reside. Only the distributed server is " running" nagios. The distributed server is sending info back to the central server using send-nsca

I currently show about 1100 hosts on the central server with a total of about 6 checks per device in the remote locations

My issues are this :

1. I cannot ping or ssh anywhere from the nagios central server. I keep getting No buffer space available. If I run arp on the central server, I see every client listed in there. Rebooting the server did not clear the table and i have not been able to manually remove any entries from the arp table.

2. I initially had a ping check running from the central server to the hosts, which I have since stopped. Hoping this would address a cpu usage issue as well as correct the arp / buffer space issue. Still seeing the arp issues however.

3. I commented out the check-host-alive check in the templates and command.cfg file, restarted nagios on the core server but I apparently have now stopped sending checks back from my distributed servers. I am however now able to ping and ssh from the core server.

Is nagios killing my servers arp table ? I am uncommenting the check-host-alive check & restarting to see if that gets my checks back on line.

Also, If I shut down nagios on the central server, the distributed servers are still going to run their checks locally and set up for send-nsca to send them to the central server. if the central server is " off-line" Do the nsca checks fall off the remote distributed server or do I run the risk of filling up the dir & ultimately that partition ?

Sorry for the rambling,

Steve
a31modela
Posts: 18
Joined: Thu Jun 24, 2010 12:40 pm

Re: Issues with my initial Core Rollout - Help

Post by a31modela »

Apparently , the issue appears to be that when the core server was built, they put the ip for the server in as the default route, screwing up my local arp table.

i still was wondering what happens on the distributed server to the stuff in send-nsca if there is no central server to send to, does it " ageoff "

Thanks,

Steve
mguthrie
Posts: 4380
Joined: Mon Jun 14, 2010 10:21 am

Re: Issues with my initial Core Rollout - Help

Post by mguthrie »

Passive results will not get cached anywhere if the central server is down, they'll just get dropped until it comes back on again. It's a weakness of passive checks unfortunately.
Locked