CPU usage high and checks delayed

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
kyang

Re: CPU usage high and checks delayed

Post by kyang »

Does the XI notification log show these notifications getting sent around the time you are receiving them? Is this how you can tell it's catching up with old alerts or how?

Do the problems at the location that went down still exist?
User avatar
snapon_admin
Posts: 952
Joined: Mon Jun 10, 2013 10:39 am
Location: Kenosha, WI
Contact:

Re: CPU usage high and checks delayed

Post by snapon_admin »

The problems at that location no longer exist, and when I say it's catching up I mean that I'll look at a service check and it will say "Next check 12:40" when it's currently 12:55. The checks seem to be running still, but they're behind. And because they're trying to catch up I keep getting high CPU usage, and high load. Another way I can tell that things are behind is the scheduled events over time graph will start off looking pretty normal after I restart nagios but will eventually drop to almost nothing, and the Monitoring Engine check statistics will show 0s for 1-min, 5-min, and 15-min active checks.
User avatar
snapon_admin
Posts: 952
Joined: Mon Jun 10, 2013 10:39 am
Location: Kenosha, WI
Contact:

Re: CPU usage high and checks delayed

Post by snapon_admin »

Here's a look at our Active Service checks graph. The top one is the last 48 hours with the time of the outage I explained highlighted (approximately 2 PM CST) and the bottom graph is what it typically looks like (showing the last 7 days).
You do not have the required permissions to view the files attached to this post.
User avatar
snapon_admin
Posts: 952
Joined: Mon Jun 10, 2013 10:39 am
Location: Kenosha, WI
Contact:

Re: CPU usage high and checks delayed

Post by snapon_admin »

Additional info on check latency and scheduled events over time.
You do not have the required permissions to view the files attached to this post.
User avatar
snapon_admin
Posts: 952
Joined: Mon Jun 10, 2013 10:39 am
Location: Kenosha, WI
Contact:

Re: CPU usage high and checks delayed

Post by snapon_admin »

This happens pretty much any time we have a major outage anywhere. I really just need a way to tell Nagios to stop playing catch up and start running new checks. There's gotta be a queue somewhere I can clear or something right?
dwasswa

Re: CPU usage high and checks delayed

Post by dwasswa »

Hi @snapon_admin,

Let's try clearing out the system and then restart it because it seems like it's stuck.

Please run the commands below:

Code: Select all

service nagios stop
service ndo2db stop
service crond stop
pkill -9 -u nagios

If your server is using the Postgres database, you would run the command below:

Code: Select all

echo "truncate table xi_events; truncate table xi_meta; truncate table xi_eventqueue;" | psql nagiosxi nagiosxi
If you are using MYSQL, you would run the command below:

Code: Select all

echo "truncate table xi_events; truncate table xi_meta; truncate table xi_eventqueue;" | mysql -u root -pnagiosxi nagiosxi
Then:

Code: Select all

service crond start
service ndo2db start
service nagios start
service npcd restart
Please follow the steps above and let me know if it solves your issue.

If it solves your issue, the next time it happens again, just run the same commands.
tmcdonald
Posts: 9117
Joined: Mon Sep 23, 2013 8:40 am

Re: CPU usage high and checks delayed

Post by tmcdonald »

In addition to what @dwasswa posted, if you truly want to "reset" all the checks so they start running immediately, you could remove the status.dat and retention.dat files which are what carry the state information, but this is a very heavy approach. This will remove comments, states (so it all goes back to pending), downtime, etc. so it's somewhat of a nuclear option. If that is what you want to do, then this is the closest you can get to "I just added all these hosts and services from scratch then applied my configs" with the benefit of keeping your performance data.
Former Nagios employee
User avatar
snapon_admin
Posts: 952
Joined: Mon Jun 10, 2013 10:39 am
Location: Kenosha, WI
Contact:

Re: CPU usage high and checks delayed

Post by snapon_admin »

I have a ticket open for this on the new ticketing system so for the sake of organization and rapid/fluid response we may want to lock this thread up and keep replies in one place. I have replied on the ticket (334811).
Locked