check_nrpe+systemd wildcard issue

OTR · Post by **OTR** » Mon Apr 13, 2020 4:07 pm

(EDIT: Issue identified and resolved. Culprit was SELinux. Second page of this thread contains my resolution)

I have been experiencing an issue where check_file_age fails to expand wildcards, but only when the following conditions apply:

1) check_file_age is being called via check_nrpe from the Nagios server (not the same device)
2) the nrpe daemon is being run as a systemd service

So, for example, on the host, the command is defined to look for a file that is named "file_????.txt". When I run that command from the host's command line (including the '?'s), it returns "File OK: file_1234.txt is XX seconds old and blah blah blah" (all the normal details).

When the nrpe daemon is being run as a systemd service, running "check_nrpe -H host -c check_file_age_command", it tells me "File Critical - unable to find file_????.txt".

HOWEVER, when I stop the systemd nrpe daemon and instead kick it off from the command line, the same check_nrpe command on the server returns successfully ("File OK: file_1234.txt is XX seconds old and blah blah blah").

Online reading suggested that the issue might relate to the fact that systemd is not launching the process in a shell, and that modifying the nrpe.service file's ExecStart line to start with "/bin/sh -c '<command>'" might address the issue. It has not made any difference (of note, I did try it with a simple copy command - and that test did indeed fail if the command was not wrapped in a shell).

Interestingly, I have several other hosts that are handling this wildcard just fine. I have not found any configuration or version differences between the working and non-working hosts.

Any helpful suggestions would be appreciated. Thanks in advance!

ssax · Post by **ssax** » Tue Apr 14, 2020 1:22 pm

HOWEVER, when I stop the systemd nrpe daemon and instead kick it off from the command line, the same check_nrpe command on the server returns successfully ("File OK: file_1234.txt is XX seconds old and blah blah blah").

Did you do a su - nagios before starting it from the command line? I believe you would need to do that in order to make it an apples-to-apples comparison.

What permissions do the directory and the file have?

OTR · Post by **OTR** » Tue Apr 14, 2020 2:09 pm

ssax wrote:Did you do a su - nagios before starting it from the command line? I believe you would need to do that in order to make it an apples-to-apples comparison.

I have su'd to nagios (I also tried it as my own userid, and as root, and as sudo-me) - in all cases of starting the daemon from the command line,, everything works as expected.

However, I note that you specify "-" in the command. I did not use the "-". I'll give that a try just to be sure.

ssax wrote:What permissions do the directory and the file have?

I will have to check - I assumed since the command works fine from the command line, permissions were not an issue.

I'll be back with some updates.

OTR · Post by **OTR** » Tue Apr 14, 2020 2:18 pm

Some more exact details:

The command definition is as follows:

/usr/lib64/nagios/plugins/check_file_age -w 86400 -c 172800 -W 13000000 -C 100000000 -f /backup/gerrit/daily/'date "+%Y%m%d"'_????_mysqldb.sql.gz

When run from the command line on the host, or via check_nrpe when the nrpe daemon is started via CLI, the following is returned:

FILE_AGE OK: /backup/gerrit/daily/20200414_0010_mysqldb.sql.gz is 51991 seconds old and 37434817 bytes

When run via check_nrpe when the nrpe daemon is started via systemctl, the following is returned:

FILE_AGE CRITICAL: File not found - /backup/gerrit/daily/20200414_????_mysqldb.sql.gz

Note how the "date" command is properly expanded in the failed remote call.

To eliminate possible issues, I updated the command so that the date was hardcoded:

/usr/lib64/nagios/plugins/check_file_age -w 86400 -c 172800 -W 13000000 -C 100000000 -f /backup/gerrit/daily/20200414_????_mysqldb.sql.gz

...failure via check_nrpe when nrpe daemon is started via systemctl still occured.

I then hardcoded the entire file name:

/usr/lib64/nagios/plugins/check_file_age -w 86400 -c 172800 -W 13000000 -C 100000000 -f /backup/gerrit/daily/20200414_0010_mysqldb.sql.gz

...and the command succeeded via check_nrpe when the nrpe daemon is started via systemctl.

(Actually, ssax, I think this last bit shows that the file permissions must be okay - it works via check_nrpe so long as I remove the wildcards)

OTR · Post by **OTR** » Wed Apr 15, 2020 11:52 am

ssax wrote:Did you do a su - nagios before starting it from the command line? I believe you would need to do that in order to make it an apples-to-apples comparison.

I tried this, adding the "-" to the command. No change in behavior.

ssax wrote:What permissions do the directory and the file have?

555 /
755 backup
755 gerrit
755 daily
644 <files within daily directory>

So, read and execute on all directories for "other", and read for the files within. Should be good - backed up by the fact that hardcoding the filename resulted in successful command execution via check_nrpe.

Man, I am stumped!

ssax · Post by **ssax** » Wed Apr 15, 2020 2:31 pm

I think the shell may be interpreting the ????.

As a test, try hardcoding it to this:

Code: Select all

/usr/lib64/nagios/plugins/check_file_age -w 86400 -c 172800 -W 13000000 -C 100000000 -f '/backup/gerrit/daily/20200414_????_mysqldb.sql.gz'

Does that work?

I'll lab it up on mine as well.

Edit: Yep, worked on mine with systemd unit file/systemctl.

OTR · Post by **OTR** » Wed Apr 15, 2020 3:38 pm

ssax wrote:I think the shell may be interpreting the ????.

As a test, try hardcoding it to this:
Code: Select all
/usr/lib64/nagios/plugins/check_file_age -w 86400 -c 172800 -W 13000000 -C 100000000 -f '/backup/gerrit/daily/20200414_????_mysqldb.sql.gz'
Does that work?

I'll lab it up on mine as well.

Edit: Yep, worked on mine with systemd unit file/systemctl.

Unfortunately, no, it did not work for me when I tried it yesterday:

OTR wrote:To eliminate possible issues, I updated the command so that the date was hardcoded:

/usr/lib64/nagios/plugins/check_file_age -w 86400 -c 172800 -W 13000000 -C 100000000 -f /backup/gerrit/daily/20200414_????_mysqldb.sql.gz

...failure via check_nrpe when nrpe daemon is started via systemctl still occurred.

ssax · Post by **ssax** » Wed Apr 15, 2020 3:56 pm

This is what I had to set the command as to get it to work:

Code: Select all

/usr/lib64/nagios/plugins/check_file_age -w 86400 -c 172800 -W 13000000 -C 100000000 -f "/backup/gerrit/daily/`date '+%Y%m%d'`_"'????_mysqldb.sql.gz'

Additionally, I don't see any of your commands with the -f part single quoted, please test again.

OTR · Post by **OTR** » Thu Apr 16, 2020 9:27 am

ssax wrote:This is what I had to set the command as to get it to work:
Code: Select all
/usr/lib64/nagios/plugins/check_file_age -w 86400 -c 172800 -W 13000000 -C 100000000 -f "/backup/gerrit/daily/`date '+%Y%m%d'`_"'????_mysqldb.sql.gz'
Additionally, I don't see any of your commands with the -f part single quoted, please test again.

Oh, I totally missed those quotes. I'll report back shortly.

Edit: tried both of the following. Neither worked via check_nrpe or the CLI on the host. Which is consistent, at least, just in the wrong direction

Code: Select all

/usr/lib64/nagios/plugins/check_file_age -w 86400 -c 172800 -W 13000000 -C 100000000 -f '/backup/gerrit/daily/20200414_????_mysqldb.sql.gz'

Code: Select all

/usr/lib64/nagios/plugins/check_file_age -w 86400 -c 172800 -W 13000000 -C 100000000 -f "/backup/gerrit/daily/`date '+%Y%m%d'`_"'????_mysqldb.sql.gz'

OTR · Post by **OTR** » Thu Apr 16, 2020 11:25 am

A colleague of mine thinks this may relate to SELinux. I'll be investigating that today, if that's the issue I'll be sure to come back and report.

For the heck of it, I'll share the other, different but similarly perplexing, issue.

The other issue is on a different host. The command being used is "check_disk -L". Run via the command line, all is well. Run via check_nrpe from the server, I get a response along the lines of "DISK CRITICAL - /proc/fs/nfsd is not accessible: permission denied"

I checked permissions from / on down, everything looked good. I also tried temporarily opening up permissions to 777 for the entire path down - issue persisted.

Much like the other issue, the command runs fine from the host's CLI. Also like the other issue, everything operates fine via check_nrpe when I stop the nrpe service and manually launch the daemon from the CLI, whether the user be me, sudo-me, root, or nagios.

Nagios Support Forum

check_nrpe+systemd wildcard issue

check_nrpe+systemd wildcard issue

Re: check_nrpe+systemd wildcard issue

Re: check_nrpe+systemd wildcard issue

Re: check_nrpe+systemd wildcard issue

Re: check_nrpe+systemd wildcard issue

Re: check_nrpe+systemd wildcard issue

Re: check_nrpe+systemd wildcard issue

Re: check_nrpe+systemd wildcard issue

Re: check_nrpe+systemd wildcard issue

Re: check_nrpe+systemd wildcard issue