Spike in host down showing in Host Status Summary

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
uhiadmin
Posts: 85
Joined: Sat Jan 15, 2011 9:01 am

Spike in host down showing in Host Status Summary

Post by uhiadmin »

Linux Distribution and version?
1. Cent OS 5.5 / 64 Bit
2. Manual Install
3. Gnome installed

To Nagios XI Tech Support Specialist:
Our host status summary would spike to 500 down when we only have three down. In CPU Stats, the I/O Wait would go red when it gets to a high percentage. Do I have to make adjustments in php.ini file since we added more hosts and services.

Total Hosts: 3367
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Spike in host down showing in Host Status Summary

Post by scottwilkerson »

This sounds like your Nagios XI machine may be overloaded. Can you post a screenshot of the server statistics on the XI homepage? Also, how many CPU's this system has, RAM, frequency of your checks.

Also, you may want to take a look at the following page on boosting XI performance
http://assets.nagios.com/downloads/nagi ... p#boosting
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
uhiadmin
Posts: 85
Joined: Sat Jan 15, 2011 9:01 am

Re: Spike in host down showing in Host Status Summary

Post by uhiadmin »

Sure,
I will send one to you when it happens again. Last time this happened we adjusted somethingin the php.ini file. Do we have to do that again when we added 2000 more hosts to be monitored????



current in the php.ini

max_execution_time = 30 ; Maximum execution time of each script, in seconds
max_input_time = 60 ; Maximum amount of time each script may spend parsing request data
memory_limit = 128M ; Maximum amount of memory a script may consume

what it needs to be changed to

max_execution_time = 60 ; Maximum execution time of each script, in seconds
max_input_time = 60 ; Maximum amount of time each script may spend parsing request data
memory_limit = 256M ; Maximum amount of memory a script may consume

4.

restart apache web server

service httpd restart
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Spike in host down showing in Host Status Summary

Post by scottwilkerson »

Before making a recommendation I'd like to know what is getting loaded.

It sounds like you are getting a high load on your server from DISK I/O.

Take a look at
http://assets.nagios.com/downloads/nagi ... p#boosting

as it is a bunch of items relating to improving performance on your XI server.
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
uhiadmin
Posts: 85
Joined: Sat Jan 15, 2011 9:01 am

Re: Spike in host down showing in Host Status Summary

Post by uhiadmin »

The attached is what I am seeing from the system. Let me know if you can see the image.....
You do not have the required permissions to view the files attached to this post.
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Spike in host down showing in Host Status Summary

Post by scottwilkerson »

That's what I thought.

I would recommend looking at this post and the one below it for some suggestions..
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
uhiadmin
Posts: 85
Joined: Sat Jan 15, 2011 9:01 am

Re: Spike in host down showing in Host Status Summary

Post by uhiadmin »

Its showing a spike again.
You do not have the required permissions to view the files attached to this post.
mguthrie
Posts: 4380
Joined: Mon Jun 14, 2010 10:21 am

Re: Spike in host down showing in Host Status Summary

Post by mguthrie »

Can you run:

Code: Select all

service nagios stop
killall -9 nagios
service nagios start
I just want to verify that there aren't multiple instances of nagios running.

Also, I'm noticing that the check results for those down hosts do appear to be valid, so I'm wondering if the actual monitoring server is losing connectivity for a few minutes or seconds. If so, Nagios would have a HUGE number of event handlers, retries, and notifications to deal with, and I'm guessing this would cause your CPU spike.
Locked