Nagios Support Forum

Posted: **Tue Nov 07, 2017 1:51 pm**

Does the XI notification log show these notifications getting sent around the time you are receiving them? Is this how you can tell it's catching up with old alerts or how?

Do the problems at the location that went down still exist?

Posted: **Tue Nov 07, 2017 1:57 pm**

The problems at that location no longer exist, and when I say it's catching up I mean that I'll look at a service check and it will say "Next check 12:40" when it's currently 12:55. The checks seem to be running still, but they're behind. And because they're trying to catch up I keep getting high CPU usage, and high load. Another way I can tell that things are behind is the scheduled events over time graph will start off looking pretty normal after I restart nagios but will eventually drop to almost nothing, and the Monitoring Engine check statistics will show 0s for 1-min, 5-min, and 15-min active checks.

Posted: **Tue Nov 07, 2017 2:12 pm**

Here's a look at our Active Service checks graph. The top one is the last 48 hours with the time of the outage I explained highlighted (approximately 2 PM CST) and the bottom graph is what it typically looks like (showing the last 7 days).

Posted: **Tue Nov 07, 2017 2:16 pm**

Additional info on check latency and scheduled events over time.

Posted: **Tue Nov 07, 2017 3:06 pm**

This happens pretty much any time we have a major outage anywhere. I really just need a way to tell Nagios to stop playing catch up and start running new checks. There's gotta be a queue somewhere I can clear or something right?

Posted: **Tue Nov 07, 2017 4:42 pm**

Hi @snapon_admin,

Let's try clearing out the system and then restart it because it seems like it's stuck.

Please run the commands below:

Code: Select all

service nagios stop
service ndo2db stop
service crond stop
pkill -9 -u nagios

If your server is using the Postgres database, you would run the command below:

Code: Select all

echo "truncate table xi_events; truncate table xi_meta; truncate table xi_eventqueue;" | psql nagiosxi nagiosxi

If you are using MYSQL, you would run the command below:

Code: Select all

echo "truncate table xi_events; truncate table xi_meta; truncate table xi_eventqueue;" | mysql -u root -pnagiosxi nagiosxi

Then:

Code: Select all

service crond start
service ndo2db start
service nagios start
service npcd restart

Please follow the steps above and let me know if it solves your issue.

If it solves your issue, the next time it happens again, just run the same commands.

Posted: **Tue Nov 07, 2017 4:47 pm**

In addition to what @dwasswa posted, if you truly want to "reset" all the checks so they start running immediately, you could remove the status.dat and retention.dat files which are what carry the state information, but this is a very heavy approach. This will remove comments, states (so it all goes back to pending), downtime, etc. so it's somewhat of a nuclear option. If that is what you want to do, then this is the closest you can get to "I just added all these hosts and services from scratch then applied my configs" with the benefit of keeping your performance data.

Posted: **Tue Nov 07, 2017 5:28 pm**

I have a ticket open for this on the new ticketing system so for the sake of organization and rapid/fluid response we may want to lock this thread up and keep replies in one place. I have replied on the ticket (334811).

Nagios Support Forum

CPU usage high and checks delayed

Re: CPU usage high and checks delayed

Re: CPU usage high and checks delayed

Re: CPU usage high and checks delayed

Re: CPU usage high and checks delayed

Re: CPU usage high and checks delayed

Re: CPU usage high and checks delayed

Re: CPU usage high and checks delayed

Re: CPU usage high and checks delayed