Our Nagios XI installation has started acting up, where it doesn't look like it is taking in new service checks. That is, the status on in the web front-end isn't updating (most obvious is the "Last check" column, but as far as I can tell none of the fields are updated).
Looking at the performance graph of a service shows that data is in fact collected, as the graph is up-to-date. But we are not sure if state-history is actually saved or not.
It will typically hang for up to an hour, and then suddenly resume as if nothing had happened. Then it may suddenly hang again.
We followed the troubleshooting steps outlined in https://support.nagios.com/kb/article.php?id=19 and came to the conclusion that this may be a problem with ndo2db, as the information in "classic" Nagios (non-XI) is up-to-date. When looking in /etc/mariadb/mariadb.log we see no mention of crashed tables.
We have enabled debug-logging for ndo2db debug_level=3 and notice that this log is at a standstill when hang occurs, and resume printing when the system works. I don't see anything in this log describing the actual problem though. Sometimes we get the following in our nagios.log:
Code: Select all
[1512126403] ndomod: Error writing to data sink! Some output may get lost...
[1512126403] ndomod: Please check remote ndo2db log, database connection or SSL Parameters
System:
CentOS 7, 64-bit
Nagios XI 5.4.11 (manually installed by downloading installation-script)