Nagios log rotation quit working

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
benture
Posts: 7
Joined: Wed Jun 17, 2020 5:55 pm

Nagios log rotation quit working

Post by benture »

I noticed recently that our nagios.log rotation stopped working. My first clue came when I tried to view the alert history on a service check, and when I clicked the back arrow ("Latest Archive"), there were errors such as:
Error: Cannot open log file '/var/log/nagios/archives/nagios-12-02-2020-00.log' for reading!
I looked in the /var/log/nagios/archives directory, and, sure enough, the most recent log file was from September. The /var/log/nagios/nagios.log file was much larger than any of those in the archive directory, and I found that it contained entries from just after the most recent file in the archive directory.

Aha! It seems we have a log rotation issue!

I checked the /var/log/nagios/nagios.log and the /var/log/messages files and found errors similar to this each day at precisely midnight:
Error: Unable to rename file '/var/log/nagios/nagios.log' to '/var/log/nagios/archives/nagios-12-02-2020-00.log': Permission denied
After reading some online forums citing similar issues, I changed the permissions on the /var/log/nagios/archives directory to drwxrwxrwx (with the original owner and the group both being nagios). I rebooted and waited a day to see if that fixed the issue. It didn't.

Some configuration information:

1. We are allowing Nagios to perform the log rotation (rather than creating our own cron/logrotate jobs to do so). From our nagios.cfg file:

Code: Select all

log_file=/var/log/nagios/nagios.log
log_rotation_method=d
log_archive_path=/var/log/nagios/archives
2. Members of the nagios group are the nagios (973) and apache users. From /etc/group:

Code: Select all

nagios:x:973:apache
Other troubleshooting steps I have performed:

1. I tried changing to the nagios user and manually calling the logrotate command (pointing to the default /etc/logrotate.d/nagios file after uncommenting the commands). This produced the same "permission denied" error that I found in the nagios.log and messages log files (see above). I tried this with the nagios service both stopped and running.

2. Thinking that maybe the current large nagios.log file was corrupted somehow, I manually moved it into the archive directory (using sudo) and re-created the /var/log/nagios/nagios.log with its original permissions. I then rebooted and waited a day to see if log rotation took place. It didn't. Same errors in the logs.

Does anyone have any ideas on how I can troubleshoot this further, or--better yet--have a solution to this problem? :D

Thanks in advance!
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Nagios log rotation quit working

Post by scottwilkerson »

What are the permissions here

Code: Select all

ls -ld /var/log/nagios/archives
ls -ld /var/log/nagios
ls -ld /var/log
ls -ld /var
what nagios processes are running

Code: Select all

ps -ef|grep nagios.cfg
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
benture
Posts: 7
Joined: Wed Jun 17, 2020 5:55 pm

Re: Nagios log rotation quit working

Post by benture »

Thanks for the response!

Permissions (keep in mind that I opened up permissions on the archives directory for troubleshooting purposes--to no effect):

Code: Select all

drwxrwxrwx. 2 nagios nagios 4096 Dec  1 10:41 /var/log/nagios/archives
drwxr-xr-x. 3 nagios nagios 40 Dec  1 10:42 /var/log/nagios
drwxr-xr-x. 25 root root 4096 Dec  2 11:14 /var/log
drwxr-xr-x. 22 root root 4096 May 28  2020 /var
Nagios processes:

Code: Select all

nagios    362020       1  0 Dec02 ?        00:00:33 /usr/sbin/nagios -d /etc/nagios/nagios.cfg
nagios    362025  362020  0 Dec02 ?        00:00:02 /usr/sbin/nagios -d /etc/nagios/nagios.cfg
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Nagios log rotation quit working

Post by scottwilkerson »

benture wrote:Thanks for the response!

Permissions (keep in mind that I opened up permissions on the archives directory for troubleshooting purposes--to no effect):

Code: Select all

drwxrwxrwx. 2 nagios nagios 4096 Dec  1 10:41 /var/log/nagios/archives
drwxr-xr-x. 3 nagios nagios 40 Dec  1 10:42 /var/log/nagios
drwxr-xr-x. 25 root root 4096 Dec  2 11:14 /var/log
drwxr-xr-x. 22 root root 4096 May 28  2020 /var
Nagios processes:

Code: Select all

nagios    362020       1  0 Dec02 ?        00:00:33 /usr/sbin/nagios -d /etc/nagios/nagios.cfg
nagios    362025  362020  0 Dec02 ?        00:00:02 /usr/sbin/nagios -d /etc/nagios/nagios.cfg
What version of Nagios are you running? Have you added any neb_module lines to the config such as livestatus?
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
benture
Posts: 7
Joined: Wed Jun 17, 2020 5:55 pm

Re: Nagios log rotation quit working

Post by benture »

We are running Nagios version 4.4.5 (installed on a CentOS 8 server using dnf). While I see that there is an update available (from the Nagios home screen in the web UI), 4.4.5-1.el8 appears to be the latest version available in the CentOS 8 (epel) repository.

There are no Event Broker Modules configured in the nagios.cfg file.
benture
Posts: 7
Joined: Wed Jun 17, 2020 5:55 pm

Re: Nagios log rotation quit working

Post by benture »

Any thoughts on a troubleshooting path forward on this?

Thanks in advance!
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Nagios log rotation quit working

Post by scottwilkerson »

benture wrote:Any thoughts on a troubleshooting path forward on this?

Thanks in advance!
Is selinux enabled on the server?
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
benture
Posts: 7
Joined: Wed Jun 17, 2020 5:55 pm

Re: Nagios log rotation quit working

Post by benture »

SELinux is set to "Permissive". Here are the contents of the /etc/selinux/config file:

Code: Select all

# This file controls the state of SELinux on the system.
# SELINUX= can take one of these three values:
#     enforcing - SELinux security policy is enforced.
#     permissive - SELinux prints warnings instead of enforcing.
#     disabled - No SELinux policy is loaded.
SELINUX=permissive
# SELINUXTYPE= can take one of these three values:
#     targeted - Targeted processes are protected,
#     minimum - Modification of targeted policy. Only selected processes are protected.
#     mls - Multi Level Security protection.
SELINUXTYPE=targeted
benture
Posts: 7
Joined: Wed Jun 17, 2020 5:55 pm

Re: Nagios log rotation quit working

Post by benture »

This seems to have stumped everybody! Does anyone out there have any further suggestions on how to proceed?

Thanks!
benture
Posts: 7
Joined: Wed Jun 17, 2020 5:55 pm

Re: Nagios log rotation quit working

Post by benture »

Follow up: I figured it out!

At some point we had created a nagios Active Directory account so that we could use it to authenticate with Office 365 in order to send our notifications directly via O365 using that account (instead of using a dummy [email protected] email address and sending it via open relay on our internal Exchange server). Apparently, over time, this caused Nagios to create some new file and folder entries using the UID of the AD nagios account, rather than the local one. This ended up breaking log rotation, because the account trying to write the new file had a different UID than the one that had permissions to do so, even though they both had the same name.

ls -n was my friend in figuring this out.

I stumbled upon this solution when nagios.service wouldn't start on reboot, due to the inability to write a new .pid file (because of the same issue).

I saw two options to resolve this issue:
1. Rename or use a different AD account for sending notification emails
2. Change the nagios_user directive in the nagios.cfg file to use the UID instead of the nagios account name (nagios_user=###)

I chose the latter, and everything has been working fine ever since!
Locked