Hi,
Today I tried to generate an availability report, but the numbers presented seemed off (100 % percent uptime? I know that's not the case). Also, some of the reports seemed to be highly sensitive to the the "First Assumed Service State" option - e.g. going from 100 % "Ok" to 100 % "Warning".
I tried to generate a legacy availability report instead, which turned out quite interesting. Reports from the last few days look all right but older reports shows a lot of "Undetermined". A report covering the first three months of this year shows 97 % "undecidable" for every single service.
I am not quite sure how to troubleshoot this issue. What do you think is the cause of this behaviour?
I suspect this could be some kind of database issue.
Could it be the case, that for a long period of time Nagios hasn't been able to write availability information correctly? Where does Nagios expect to find this information? We can access state history without problems.
If this is indeed a case of availability-data missing from the database, is it possible to regenerate this data based on the state history?
Our system is running NagiosXI 5.6.5 on CentOS 7, 64-bit, manual install.
Edit: While investigating this issue I upgraded XI to version 5.6.12. The behaviour described above remains unchanged.
Availability report shows large amount of Undetermined Time
Re: Availability report shows large amount of Undetermined T
The availability reports are generated from /usr/local/nagios/var/nagios.log and logs in /usr/local/nagios/var/archives. Can you share some screenshots highlighting an example of odd results results and include the details set for the report? I'd like to see this and the logs that make up the the report(the should be PM'd to me).
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Re: Availability report shows large amount of Undetermined T
I have sent you a PM with nagios.log, the contents of the archives-directory, as well as (parts of) availability reports covering April.
The archives-directory contains logs from the last 14 days, is this expected?
Still, the availability reports don't appear to read data older than the 6th.
Does it only read the logs which haven't been zipped yet?
Edit: I found a scheduled task in /etc/cron.daily which zips and removes old logs from the archive-directory. My guess would be that this is not an official Nagios-file. Probably a cost-cutting measure on our part, with unintended consequences.
That said, is there a way to restore old nagios.log-files based on the contents of the XI-database?
The archives-directory contains logs from the last 14 days, is this expected?
Still, the availability reports don't appear to read data older than the 6th.
Does it only read the logs which haven't been zipped yet?
Edit: I found a scheduled task in /etc/cron.daily which zips and removes old logs from the archive-directory. My guess would be that this is not an official Nagios-file. Probably a cost-cutting measure on our part, with unintended consequences.
That said, is there a way to restore old nagios.log-files based on the contents of the XI-database?
Re: Availability report shows large amount of Undetermined T
Nope, not an official file.
There's not a way to replay the DB back to the archives at this point in time.
How far does your State History report go back?
There's not a way to replay the DB back to the archives at this point in time.
How far does your State History report go back?
Re: Availability report shows large amount of Undetermined T
Our state history goes back two years. We were thus quite surprised when we noticed that the availability reports only accessed data from the last couple of days. We had assumed that they would be based on data found in the event history.
We are primarily interested in generating reports on the last month or so of history, which should be possible in a week or two. It would of course be nice with a tool that could restore the older files (primarily for archiving purposes) but it is not business critical.
I hope you will consider expanding the description of nagios.log in the documentation on log files to comment on the fact that this log is used as the data source for certain reports. I would consider that a useful addition as it might prevent others from indiscriminately deleting these "archived" logs the way we did.
We are primarily interested in generating reports on the last month or so of history, which should be possible in a week or two. It would of course be nice with a tool that could restore the older files (primarily for archiving purposes) but it is not business critical.
I hope you will consider expanding the description of nagios.log in the documentation on log files to comment on the fact that this log is used as the data source for certain reports. I would consider that a useful addition as it might prevent others from indiscriminately deleting these "archived" logs the way we did.
Re: Availability report shows large amount of Undetermined T
That is a good suggestion. I'll ping the kb team to have it updated.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.