Home » Categories » Multiple Categories

Nagios XI - Last Check Time Not Updating

Summary

There are many causes for this error. Most often it is due to a connection issue to the backend historical database, crashed database tables, core scheduling/check execution issues, or lack of resources (causing orphaned checks).

 

 

Editing Files

In many steps of this article you will be required to edit files. This documentation will use the vi text editor. When using the vi
editor:

  • To make changes press i on the keyboard first to enter insert mode
  • Press Esc to exit insert mode
  • When you have finished, save the changes in vi by typing :wq and press Enter

 

 

More Details

The typical workflow can be explained as follows:

Nagios core schedules a check, and once the check is run the output is returned and the ndomod NEB module pushes the check result to the ndo2db daemon by placing it in the kernel message queue. The ndo2db daemon then connects to the mysql "nagios" (ndoutils) historical database and inserts the check result. The Nagios XI php scripts then query the "nagios" database for the status information to display on the frontend (in contrast to Nagios Core CGIs which query the status.dat file directly).

Thus, there are a number of things that can interfere with updating the "Last Check" time on the XI UI.

  1. The check is failing to be scheduled or executed.
  2. ndo2db is failing to insert the check result into the "nagios" mysql database or the Nagios XI frontend database query is failing.

 

 

Troubleshooting

The troubleshooting step is to verify if the checks are actually getting scheduled and executed. If they are not, it is usually an issue with the Nagios Core engine. If they are, it is most likely a database issue.

The easiest way to verify this is to check the Nagios Core web frontend to see if the "Last Check" time is updating. Browse to:

http://<server_ip_or_hostname>/nagios/

Check any of the details for an object that is currently experiencing issues with "Last Check" times. If the Core interface displays accurate "Last Check" times, proceed to Step 2 below. If the Core interface is experiencing the same issues as the XI interface, follow Step 1 below.

 

1. The check is failing to be scheduled or executed

Issues with the Nagios Core auto-rescheduler directives:

There were a few bugs with the introduction of the auto_rescheduling feature in Nagios Core 4.0.8 (released 08/12/2014) which is used in Nagios XI 2014R1.4 (released 08/14/2014). Those affected by this bug will notice the nagios.log file filled with errors pertaining to rescheduled checks. Originally, the new directives added to nagios.cfg could cause rescheduled checks to never execute, and instead be continuously rescheduled. The original /usr/local/nagios/etc/nagios.cfg directives were:

auto_reschedule_checks=1
auto_rescheduling_interval=30
auto_rescheduling_window=180

Reducing the auto_rescheduling_window to 45 should resolve this issue:

auto_reschedule_checks=1
auto_rescheduling_interval=30
auto_rescheduling_window=45

Once the above changes are made to nagios.cfg, restart Nagios Core using one of the commands below:

RHEL 6 | CentOS 6 | Oracle Linux 6 | Ubuntu 14

service nagios restart

 

RHEL 7 | CentOS 7 | Oracle Linux 7 | Debian | Ubuntu 16/18

systemctl restart nagios.service

 

Resource Issues forcing the rescheduling of checks:

If the system ulimit settings are too restrictive, checks may be orphaned and forced to reschedule. Usually, this behaviour is identified by checking the nagios.log file for lines similar to:

[1331905537] Warning: The check of service 'SERVICE' on host 'NAMESERVER' looks like it WAS orphaned (results never Came
back). I'm scheduling an immediate check of the service ... [1331755699] Warning: The check of service 'SWAP' on host 'nameserver'
not could be due to Performed to fork () error 'Resource temporarily unavailable'. The check will be rescheduled.

If many of those lines exist in nagios.log, perform the following tasks to increase the kernel ulimts:

Edit the file /etc/security/limits.conf and define / update the following settings:

#locked memory 
* hard memlock 128
* soft memlock 128

#open files
* soft nofile 10000
* hard nofile 10000
root hard nofile 10000
root soft nofile 10000

#max user processes
* hard nproc 4096
* soft nproc 4096

#stack size
* hard stack 20480
* soft stack 20480

If the setting does not exist then add the line. Once you have made the changes save the file and restart the server.

After the server has rebooted, execute the following command to verify that the new settings are in place:

ulimit -a

 

 

2. ndo2db is failing to insert the check result into the "nagios" mysql database.

There are crashed tables in the Nagios database:

Crashed tables can be identified by checking the mysql/mariadb logs located at:

/var/log/mysqld.log

or for mariadb:

/var/log/mariadb/

The relevant errors should resemble:

141127 10:40:24 [ERROR] /usr/libexec/mysqld: Table './nagios/nagios_logentries' is marked as crashed and last (automatic?) repair failed

Repair the tables with the following command:

cd /usr/local/nagiosxi/scripts/
./repair_databases.sh

 

 

Check For Multiple Nagios Processes

After following the steps above, make sure that multiple nagios processes are not running.

Execute this command to check:

ps -ef | grep nagios.cfg | grep -v grep

 

The following output is healthy:

nagios    5713     1  0 08:40 ?        00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 5723 5713 0 08:40 ? 00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg

You can see the first line has a PID of 5713, this is the parent process.

The second line has the PID of 5723 however you can see that it references the parent PID of 5713, this is a child process of the parent and is normal behavior. On heavily-loaded systems you may see multiple child processes - this is normal behavior.

If your output has more than one parent process, execute the following commands:

RHEL 6 | CentOS 6 | Oracle Linux 6 | Ubuntu 14

service nagios stop
killall -9 nagios
service nagios start

 

RHEL 7 | CentOS 7 | Oracle Linux 7 | Debian | Ubuntu 16/18

systemctl stop nagios.service
killall -9 nagios
systemctl start nagios.service

 

 

Final Thoughts

For any support related questions please visit the Nagios Support Forums at:

http://support.nagios.com/forum/

0 (0)
Article Rating (No Votes)
Rate this article
  • Icon PDFExport to PDF
  • Icon MS-WordExport to MS Word
Attachments Attachments
There are no attachments for this article.
Related Articles RSS Feed
Nagios XI - Unable To Login Using Two Factor Authentication
Viewed 475 times since Tue, Apr 10, 2018
Nagios XI - SQL Error [nagiosxi] : ERROR: syntax error
Viewed 711 times since Sun, Sep 10, 2017
Nagios XI - Empty Screen for Wizard, Component, Dashlet
Viewed 657 times since Wed, Jan 27, 2016
Nagios XI - Profile Build Failed
Viewed 1067 times since Tue, Aug 2, 2016
CCM says unapplied changes exist, but none listed
Viewed 498 times since Mon, Feb 27, 2017
Nagios XI - Migrate Performance Data
Viewed 1247 times since Tue, Jan 26, 2016
Nagios XI - How To Test Check Commands From The Command-line
Viewed 4556 times since Tue, Jan 26, 2016
Nagios XI - Issues with mod_gearman and Performance Data Newlines: "\n"
Viewed 440 times since Tue, Feb 2, 2016
Nagios XI - Core Configuration Mananger Display Issues
Viewed 442 times since Tue, Jan 26, 2016
Nagios XI - Apply Configuration Fails - Backend login to the Core Configuration failed
Viewed 1334 times since Tue, Aug 2, 2016