Page 1 of 1

Not saving state retention data on shutdown or restart

Posted: Tue Jan 28, 2020 4:21 pm
by thebream
Nagios Core 4.3.4 on Raspbian GNU/Linux 10 (buster)

I have state retention enabled:
pi@tim-rpi3:/etc/nagios4 $ grep retention nagios.cfg | grep -v "^#"
state_retention_file=/var/lib/nagios4/retention.dat
retention_update_interval=60


And the state is saved every 60 minutes, as expected.

According to docs, it should also save before shutting down:
This is the file that Nagios will use for storing status, downtime, and comment information before it shuts down.
But on my system it is not saving on shutdown, so if I have a state that has changed since last auto-save - the state is incorrect after restart.

However, It does do a save when I reload service.

Example, stop exim4 service, restart nagios and check status:

Code: Select all

pi@tim-rpi3:~ $ date; sudo systemctl stop exim4
Wed 29 Jan 07:12:03 AEDT 2020

[rescheduled exim4 service check from web UI]
[status is critical, last check 2020-01-29 07:12:25]

pi@tim-rpi3:~ $ date; sudo ls -l /var/lib/nagios4/retention.dat
Wed 29 Jan 07:16:06 AEDT 2020
-rw------- 1 nagios nagios 42093 Jan 29 07:07 /var/lib/nagios4/retention.dat

pi@tim-rpi3:~ $ date; sudo systemctl stop nagios4
Wed 29 Jan 07:16:55 AEDT 2020

pi@tim-rpi3:~ $ date; sudo ls -l /var/lib/nagios4/retention.dat
Wed 29 Jan 07:18:14 AEDT 2020
-rw------- 1 nagios nagios 42093 Jan 29 07:07 /var/lib/nagios4/retention.dat
[has not been updated]

pi@tim-rpi3:~ $ date; sudo systemctl start nagios4
Wed 29 Jan 07:18:31 AEDT 2020

[service status reverted back to OK, last check 2020-01-29 06:29:13]

pi@tim-rpi3:~ $ date; sudo systemctl reload nagios4
Wed 29 Jan 07:21:07 AEDT 2020

pi@tim-rpi3:~ $ date; sudo ls -l /var/lib/nagios4/retention.dat
Wed 29 Jan 07:21:28 AEDT 2020
-rw------- 1 nagios nagios 42093 Jan 29 07:21 /var/lib/nagios4/retention.dat
[ has now been updated ]
Corresponding log file (timestamps converted to "nice" format):

Code: Select all

-> retention data is being saved every 60 minues, as per config
[Wed Jan 29 00:07:17 2020] Auto-save of retention data completed successfully.
[Wed Jan 29 01:07:17 2020] Auto-save of retention data completed successfully.
[Wed Jan 29 02:07:17 2020] Auto-save of retention data completed successfully.
[Wed Jan 29 03:07:17 2020] Auto-save of retention data completed successfully.
[Wed Jan 29 04:07:17 2020] Auto-save of retention data completed successfully.
[Wed Jan 29 05:07:17 2020] Auto-save of retention data completed successfully.
[Wed Jan 29 06:07:17 2020] Auto-save of retention data completed successfully.
[Wed Jan 29 07:07:17 2020] Auto-save of retention data completed successfully.

-> stop exim4 process and reschedule check
[Wed Jan 29 07:12:25 2020] EXTERNAL COMMAND: SCHEDULE_FORCED_SVC_CHECK;localhost;exim4 MTA;1580242342
[Wed Jan 29 07:12:25 2020] SERVICE ALERT: localhost;exim4 MTA;CRITICAL;SOFT;1;Active: inactive (dead) since Wed 2020-01-29 07:11:16 AEDT: 1min 9s ago
[Wed Jan 29 07:14:25 2020] SERVICE ALERT: localhost;exim4 MTA;CRITICAL;SOFT;2;Active: inactive (dead) since Wed 2020-01-29 07:11:16 AEDT: 3min 9s ago
[Wed Jan 29 07:16:25 2020] SERVICE ALERT: localhost;exim4 MTA;CRITICAL;HARD;3;Active: inactive (dead) since Wed 2020-01-29 07:11:16 AEDT: 5min ago
[Wed Jan 29 07:16:25 2020] SERVICE NOTIFICATION: emailtim;localhost;exim4 MTA;CRITICAL;mynotify-service-by-email;Active: inactive (dead) since Wed 2020-01-29 07:11:16 AEDT: 5min ago

-> stop, then start nagios4 service
[Wed Jan 29 07:17:11 2020] wproc: Socket to worker Core Worker 5861 broken, removing
[Wed Jan 29 07:18:31 2020] Nagios 4.3.4 starting... (PID=1624)
[Wed Jan 29 07:18:31 2020] Local time is Wed Jan 29 07:18:31 AEDT 2020
[Wed Jan 29 07:18:31 2020] LOG VERSION: 2.0
[Wed Jan 29 07:18:31 2020] qh: Socket '/var/lib/nagios4/rw/nagios.qh' successfully initialized
[Wed Jan 29 07:18:31 2020] qh: core query handler registered
[Wed Jan 29 07:18:31 2020] nerd: Channel hostchecks registered successfully
[Wed Jan 29 07:18:31 2020] nerd: Channel servicechecks registered successfully
[Wed Jan 29 07:18:31 2020] nerd: Channel opathchecks registered successfully
[Wed Jan 29 07:18:31 2020] nerd: Fully initialized and ready to rock!
[Wed Jan 29 07:18:31 2020] wproc: Successfully registered manager as @wproc with query handler
[Wed Jan 29 07:18:31 2020] wproc: Registry request: name=Core Worker 1625;pid=1625
[Wed Jan 29 07:18:31 2020] wproc: Registry request: name=Core Worker 1626;pid=1626
[Wed Jan 29 07:18:31 2020] wproc: Registry request: name=Core Worker 1629;pid=1629
[Wed Jan 29 07:18:31 2020] wproc: Registry request: name=Core Worker 1627;pid=1627
[Wed Jan 29 07:18:31 2020] wproc: Registry request: name=Core Worker 1628;pid=1628
[Wed Jan 29 07:18:31 2020] wproc: Registry request: name=Core Worker 1630;pid=1630
[Wed Jan 29 07:18:31 2020] Successfully launched command file worker with pid 1634

-> reload nagios4 service
[Wed Jan 29 07:21:07 2020] Caught SIGHUP, restarting...
[Wed Jan 29 07:21:07 2020] Event broker module 'NERD' deinitialized successfully.
[Wed Jan 29 07:21:07 2020] Nagios 4.3.4 starting... (PID=1800)
[Wed Jan 29 07:21:07 2020] Local time is Wed Jan 29 07:21:07 AEDT 2020
[Wed Jan 29 07:21:07 2020] LOG VERSION: 2.0
[Wed Jan 29 07:21:07 2020] qh: Socket '/var/lib/nagios4/rw/nagios.qh' successfully initialized
[Wed Jan 29 07:21:07 2020] qh: core query handler registered
[Wed Jan 29 07:21:07 2020] nerd: Channel hostchecks registered successfully
[Wed Jan 29 07:21:07 2020] nerd: Channel servicechecks registered successfully
[Wed Jan 29 07:21:07 2020] nerd: Channel opathchecks registered successfully
[Wed Jan 29 07:21:07 2020] nerd: Fully initialized and ready to rock!
[Wed Jan 29 07:21:07 2020] wproc: Successfully registered manager as @wproc with query handler
[Wed Jan 29 07:21:07 2020] wproc: Registry request: name=Core Worker 1864;pid=1864
[Wed Jan 29 07:21:07 2020] wproc: Registry request: name=Core Worker 1867;pid=1867
[Wed Jan 29 07:21:07 2020] wproc: Registry request: name=Core Worker 1865;pid=1865
[Wed Jan 29 07:21:07 2020] wproc: Registry request: name=Core Worker 1868;pid=1868
[Wed Jan 29 07:21:07 2020] wproc: Registry request: name=Core Worker 1866;pid=1866
[Wed Jan 29 07:21:07 2020] wproc: Registry request: name=Core Worker 1869;pid=1869

Re: Not saving state retention data on shutdown or restart

Posted: Tue Jan 28, 2020 8:45 pm
by Box293
We are going to need you to upgrade to the latest version of Nagios Core as 4.3.4 is over 2 years old. Once you've done this can you determine if this fixes your issue.

FYI this may be of use:
https://support.nagios.com/kb/article/n ... s-796.html

And so may this:
https://support.nagios.com/kb/article.php?id=797

Re: Not saving state retention data on shutdown or restart

Posted: Fri Feb 07, 2020 6:54 pm
by thebream
Thanks for the suggestion.

I can confirm my problem was fixed by building Nagios Core 4.4.5 from source and installing.

Re: Not saving state retention data on shutdown or restart

Posted: Mon Feb 10, 2020 8:08 am
by scottwilkerson
thebream wrote:Thanks for the suggestion.

I can confirm my problem was fixed by building Nagios Core 4.4.5 from source and installing.
Great!

Locking thread