Whatever I did, Nagios did not like.
Hosts were being checked but their host up or down state changes would not trigger the correct actions. I believe this was because of the retention.dat holding the last state change as incorrect.
I found the file in the archives where I had mucked with the clock:
Code: Select all
[916955650] SERVICE NOTIFICATION: admin-email;web-server;Web;OK;notify-by-email;HTTP OK: HTTP/1.1 200 OK - 218 bytes in 0.251 second response time
[1548107486] Warning: A system time change of 631151828 seconds (7304d 23h 57m 8s forwards in time) has been detected. Compensating...
I saved this and quit and replaced it and restarted nagios. But alas, I still had all that broken stuff. So I looked in retention.dat and found stuff like:
Code: Select all
last_check=2179253410
So using a bit of math, I found that if I subtracted 631151828 (the number of seconds my server had jumped ahead) from the values of "last_check" and "last_state_change" and "last_hard_state_change" values in retention.dat, they came out to be more sane.
Made a backup of the original and a newly generated retention.dat file, put the modified one in, et voila, things are working again!
Hope this will help someone else with a similar issue.