Page 1 of 1

4.4.1 core: huge retention.dat and other files

Posted: Fri Oct 12, 2018 9:57 am
by mksmr
Good afternoon all,

I'm running 4.4.1 core on Devuan 2.0 ASCII (this time on a HP DL380 Gen6, 4 cores, 32Gig RAM, so plenty of power for Nagios in our environment).

This afternoon I noticed Nagios slowing dramatically down for no apparent reason, with the main process going up to 100% on one core and serving status.cgi up to 100% on another, which made more and more processes time out. Restarting Nagios and Apache didn't help.

The CPU graph shows that load has been going up from normal to a sharp spike within three hours.

I finally checked /usr/local/nagios/var and found retention.dat being 2.8G, and a number of nagios.tmp<xxx> files from 500M to about 5G in size. I removed them, restarted Nagios and it came back up as smooth as I'm used to it.

I've been running a dozen of Nagios installations now and I've never seen such big files in var, especially I don't think I've ever seen temporary files, let alone files of about 5G in size. Any ideas what's been happening there?

TIA
Matthias

Re: 4.4.1 core: huge retention.dat and other files

Posted: Fri Oct 12, 2018 12:48 pm
by scottwilkerson
I believe what you are experiencing was fixed in 4.4.2 from this entry, which caused the retention.dat file to balloon

Code: Select all

* Fix comment data being duplicated after a `service nagios reload` or similar (#549) (Bryan Heden)

Re: 4.4.1 core: huge retention.dat and other files

Posted: Mon Oct 15, 2018 4:47 am
by mksmr
Is there a workaround or should I update?

Re: 4.4.1 core: huge retention.dat and other files

Posted: Mon Oct 15, 2018 6:59 am
by scottwilkerson
mksmr wrote:Is there a workaround or should I update?
You would need to upgrade.