Loss of data after restart.
Posted: Thu May 24, 2018 9:16 am
Hi,
We experienced an issue where at particular time, we noticed a restart on Nagios XI service.
At 04:00 am, monitoring stopped working, from event log found -
2018-05-08 04:00:02 Caught SIGTERM, shutting down...
2018-05-08 04:00:03 Event broker module '/usr/local/nagios/bin/ndomod.o' initialized successfully.
An automated backup is scheduled daily @04:00.
First reaction was that restart was related to the backups, however we noticed that some graphs stopped populating around 3:40, although having a polling time of 5 minutes.
From /var/log/messages, no errors were found between 03:30 and 04:00, before the restart.
We also noticed that for particular hosts, services lost data from the time there was a restart to the time admin account logged in after issue was detected.
We found that CCM was not applied, but our concern is howcome only number of hosts + services were affected?
How come data started being populated after configuration in CCM was applied?
Can you also kindly indicate us any log files we can check in order to detect what happened and exactly which services and hosts were affected?
We experienced an issue where at particular time, we noticed a restart on Nagios XI service.
At 04:00 am, monitoring stopped working, from event log found -
2018-05-08 04:00:02 Caught SIGTERM, shutting down...
2018-05-08 04:00:03 Event broker module '/usr/local/nagios/bin/ndomod.o' initialized successfully.
An automated backup is scheduled daily @04:00.
First reaction was that restart was related to the backups, however we noticed that some graphs stopped populating around 3:40, although having a polling time of 5 minutes.
From /var/log/messages, no errors were found between 03:30 and 04:00, before the restart.
We also noticed that for particular hosts, services lost data from the time there was a restart to the time admin account logged in after issue was detected.
We found that CCM was not applied, but our concern is howcome only number of hosts + services were affected?
How come data started being populated after configuration in CCM was applied?
Can you also kindly indicate us any log files we can check in order to detect what happened and exactly which services and hosts were affected?