Unable to acknowledge or schedule downtime

TBT · Post by **TBT** » Wed Jan 17, 2018 9:45 am

We are unable to Acknowledge or Scheduled Downtime on alarms in Nagios XI 5.4.8.

Acknowledgment allows our technicians to enter a reason and submit. But no comments or acknowledgements appear in the interface under the element and the status remains as an unhandled alarm. Issuing Scheduled Downtime produces the same result. Neither of these actions affect the element (host/service) nor are the changes reflected on their respective Incident Management pages.

Even though the audit log shows these commands are issued, no action is taken. Oddly enough, we're experiencing this on more than one XI server.

Please advise,

dwhitfield · Post by **dwhitfield** » Wed Jan 17, 2018 12:32 pm

This could be a permissions issue, or a db issue, or a performance issue...so it could be different things on different XI servers. The forum won't allow but a couple of attachments at a time. Since there are multiple XI servers involved, this might be a good candidate for a ticket: https://support.nagios.com/tickets/

Happy to try to work on this through the forum though. The first thing to try is to work through https://assets.nagios.com/downloads/nag ... tabase.pdf

If you have postgres for the nagiosxi database on any of those servers, then you'll want to run a vacuum on postgres.

Can you PM me your Profiles from the non-working systems? You can download it by going to Admin > System Config > System Profile and click the ***Download Profile*** button towards the top. If for whatever reason you *cannot* download the profile, please put the output of View System Info (5.3.4+, Show Profile if older) in the thread (that will at least get us some info). This will give us access to many of the logs we would otherwise ask for individually. If security is a concern, you can unzip the profile take out what you like, and then zip it up again. We may end up needing something you remove, but we can ask for that specifically.

You can also generate a profile manually using the script at /usr/local/nagiosxi/html/includes/components/profile/getprofile.sh

That should generate a profile in /usr/local/nagiosxi/var/components/ which you can get off the server with an application such as FileZilla.

After you PM the profile, please update this thread. Updating this thread is the only way for it to show back up on our dashboard.

If you get an error that PROFILE BUILD FAILED, please see https://support.nagios.com/kb/article.p ... ategory=44

TBT · Post by **TBT** » Thu Jan 18, 2018 3:39 pm

I've PM you a system profile from one XI host for your review. Hopefully something stands out and we can address the other hosts one by one. Most likely a DB issue.

Post by **tgriep** » Fri Jan 19, 2018 9:50 am

@dwhitfield is out of the office today and we don't have access to his personal email box so we cannot access the System Profile.
Can you post it instead?

TBT · Post by **TBT** » Fri Jan 19, 2018 11:17 am

tgriep wrote:@dwhitfield is out of the office today and we don't have access to his personal email box so we cannot access the System Profile.
Can you post it instead?

I'd rather not post it publicly. I'll PM it to you momentarily. Also, FYI, your ticketing system wouldn't allow attachments with Firefox when I tried to submit that way yesterday.

Post by **tgriep** » Fri Jan 19, 2018 3:33 pm

I received the profile and put it in the share we we can access it.

TBT · Post by **TBT** » Mon Jan 22, 2018 11:17 am

tgriep wrote:I received the profile and put it in the share we we can access it.

Where are we at with this support request?

dwhitfield · Post by **dwhitfield** » Mon Jan 22, 2018 12:39 pm

I notice you have postgres. Did you run the vacuum as I suggested? Although the article is about a different issue, you can find very complete instructions for running a vacuum at https://support.nagios.com/kb/article.php?id=25 if you scroll down a bit.

If the vacuum doesn't resolve the issue, please let us know.

TBT · Post by **TBT** » Tue Jan 23, 2018 9:18 am

dwhitfield wrote:I notice you have postgres. Did you run the vacuum as I suggested? Although the article is about a different issue, you can find very complete instructions for running a vacuum at https://support.nagios.com/kb/article.php?id=25 if you scroll down a bit.

If the vacuum doesn't resolve the issue, please let us know.

No I did not, I've been waiting for a review of the profile. To my understanding Nagios XI has three Databases associated with it's installation, nagiosxi (Postgres) being one of them. As we utilize an off-loaded MySQL database configuration for the other two, I didn't want to head down the wrong path.

In preparation to run the vacuum, I again tested the acknowledge and schedule downtime on the affected systems and these functions are working again. This raises some questions.

1. Is there a maintenance task which could have automatically ran and addressed the issue?
2. What causes the Postgres DB to behave in such a manner and why does a vacuum (reclaim storage) fix it?
3. Should we still run a vacuum on the Postgres Databases or continue to monitor the potential non-issue?
4. Can you briefly explain what is stored in the various databases (nagiosxi, ndoutils, nagiosql)?

Thank you,

dwhitfield · Post by **dwhitfield** » Tue Jan 23, 2018 10:16 am

1. Possibly. There is a dbmaint task that runs...but it runs every 5 minutes. It could be that there was a day where there were a lot of db entries, and in the intervening time that got cleared. The maintenance is not very smart. It just clears out old stuff and sometimes that fixes things "magically", like presumably you are seeing. Aside from the space issue and performance issues, that's one reason it's set to run.

2. Well, sometimes a strait vacuum doesn't fix it, which is why those instructions are so long. This is not postgres specific. In fact, mysql is much worse in this regard, or at least the MyISAM storage engine.

3. I'd just say if it gets stuck again keep it as a trouble-shooting tool. Seems like a waste of time to me if you aren't having issues.

4.
A. nagiosxi is all of the stuff to specific to XI (by which I mean, Core doesn't see it). Things like the XI users.
B. nagios is used for display.
C. nagiosql is the CCM db.

Nagios Support Forum

Unable to acknowledge or schedule downtime

Unable to acknowledge or schedule downtime

Re: Unable to acknowledge or schedule downtime

Re: Unable to acknowledge or schedule downtime

Re: Unable to acknowledge or schedule downtime

Re: Unable to acknowledge or schedule downtime

Re: Unable to acknowledge or schedule downtime

Re: Unable to acknowledge or schedule downtime

Re: Unable to acknowledge or schedule downtime

Re: Unable to acknowledge or schedule downtime

Re: Unable to acknowledge or schedule downtime