check_nrpe+systemd wildcard issue
check_nrpe+systemd wildcard issue
(EDIT: Issue identified and resolved. Culprit was SELinux. Second page of this thread contains my resolution)
I have been experiencing an issue where check_file_age fails to expand wildcards, but only when the following conditions apply:
1) check_file_age is being called via check_nrpe from the Nagios server (not the same device)
2) the nrpe daemon is being run as a systemd service
So, for example, on the host, the command is defined to look for a file that is named "file_????.txt". When I run that command from the host's command line (including the '?'s), it returns "File OK: file_1234.txt is XX seconds old and blah blah blah" (all the normal details).
When the nrpe daemon is being run as a systemd service, running "check_nrpe -H host -c check_file_age_command", it tells me "File Critical - unable to find file_????.txt".
HOWEVER, when I stop the systemd nrpe daemon and instead kick it off from the command line, the same check_nrpe command on the server returns successfully ("File OK: file_1234.txt is XX seconds old and blah blah blah").
Online reading suggested that the issue might relate to the fact that systemd is not launching the process in a shell, and that modifying the nrpe.service file's ExecStart line to start with "/bin/sh -c '<command>'" might address the issue. It has not made any difference (of note, I did try it with a simple copy command - and that test did indeed fail if the command was not wrapped in a shell).
Interestingly, I have several other hosts that are handling this wildcard just fine. I have not found any configuration or version differences between the working and non-working hosts.
Any helpful suggestions would be appreciated. Thanks in advance!
I have been experiencing an issue where check_file_age fails to expand wildcards, but only when the following conditions apply:
1) check_file_age is being called via check_nrpe from the Nagios server (not the same device)
2) the nrpe daemon is being run as a systemd service
So, for example, on the host, the command is defined to look for a file that is named "file_????.txt". When I run that command from the host's command line (including the '?'s), it returns "File OK: file_1234.txt is XX seconds old and blah blah blah" (all the normal details).
When the nrpe daemon is being run as a systemd service, running "check_nrpe -H host -c check_file_age_command", it tells me "File Critical - unable to find file_????.txt".
HOWEVER, when I stop the systemd nrpe daemon and instead kick it off from the command line, the same check_nrpe command on the server returns successfully ("File OK: file_1234.txt is XX seconds old and blah blah blah").
Online reading suggested that the issue might relate to the fact that systemd is not launching the process in a shell, and that modifying the nrpe.service file's ExecStart line to start with "/bin/sh -c '<command>'" might address the issue. It has not made any difference (of note, I did try it with a simple copy command - and that test did indeed fail if the command was not wrapped in a shell).
Interestingly, I have several other hosts that are handling this wildcard just fine. I have not found any configuration or version differences between the working and non-working hosts.
Any helpful suggestions would be appreciated. Thanks in advance!
Last edited by OTR on Tue Apr 28, 2020 9:46 am, edited 1 time in total.
Re: check_nrpe+systemd wildcard issue
Did you do a su - nagios before starting it from the command line? I believe you would need to do that in order to make it an apples-to-apples comparison.HOWEVER, when I stop the systemd nrpe daemon and instead kick it off from the command line, the same check_nrpe command on the server returns successfully ("File OK: file_1234.txt is XX seconds old and blah blah blah").
What permissions do the directory and the file have?
Re: check_nrpe+systemd wildcard issue
I have su'd to nagios (I also tried it as my own userid, and as root, and as sudo-me) - in all cases of starting the daemon from the command line,, everything works as expected.ssax wrote:Did you do a su - nagios before starting it from the command line? I believe you would need to do that in order to make it an apples-to-apples comparison.
However, I note that you specify "-" in the command. I did not use the "-". I'll give that a try just to be sure.
I will have to check - I assumed since the command works fine from the command line, permissions were not an issue.ssax wrote:What permissions do the directory and the file have?
I'll be back with some updates.
Re: check_nrpe+systemd wildcard issue
Some more exact details:
The command definition is as follows:
/usr/lib64/nagios/plugins/check_file_age -w 86400 -c 172800 -W 13000000 -C 100000000 -f /backup/gerrit/daily/'date "+%Y%m%d"'_????_mysqldb.sql.gz
When run from the command line on the host, or via check_nrpe when the nrpe daemon is started via CLI, the following is returned:
FILE_AGE OK: /backup/gerrit/daily/20200414_0010_mysqldb.sql.gz is 51991 seconds old and 37434817 bytes
When run via check_nrpe when the nrpe daemon is started via systemctl, the following is returned:
FILE_AGE CRITICAL: File not found - /backup/gerrit/daily/20200414_????_mysqldb.sql.gz
Note how the "date" command is properly expanded in the failed remote call.
To eliminate possible issues, I updated the command so that the date was hardcoded:
/usr/lib64/nagios/plugins/check_file_age -w 86400 -c 172800 -W 13000000 -C 100000000 -f /backup/gerrit/daily/20200414_????_mysqldb.sql.gz
...failure via check_nrpe when nrpe daemon is started via systemctl still occured.
I then hardcoded the entire file name:
/usr/lib64/nagios/plugins/check_file_age -w 86400 -c 172800 -W 13000000 -C 100000000 -f /backup/gerrit/daily/20200414_0010_mysqldb.sql.gz
...and the command succeeded via check_nrpe when the nrpe daemon is started via systemctl.
(Actually, ssax, I think this last bit shows that the file permissions must be okay - it works via check_nrpe so long as I remove the wildcards)
The command definition is as follows:
/usr/lib64/nagios/plugins/check_file_age -w 86400 -c 172800 -W 13000000 -C 100000000 -f /backup/gerrit/daily/'date "+%Y%m%d"'_????_mysqldb.sql.gz
When run from the command line on the host, or via check_nrpe when the nrpe daemon is started via CLI, the following is returned:
FILE_AGE OK: /backup/gerrit/daily/20200414_0010_mysqldb.sql.gz is 51991 seconds old and 37434817 bytes
When run via check_nrpe when the nrpe daemon is started via systemctl, the following is returned:
FILE_AGE CRITICAL: File not found - /backup/gerrit/daily/20200414_????_mysqldb.sql.gz
Note how the "date" command is properly expanded in the failed remote call.
To eliminate possible issues, I updated the command so that the date was hardcoded:
/usr/lib64/nagios/plugins/check_file_age -w 86400 -c 172800 -W 13000000 -C 100000000 -f /backup/gerrit/daily/20200414_????_mysqldb.sql.gz
...failure via check_nrpe when nrpe daemon is started via systemctl still occured.
I then hardcoded the entire file name:
/usr/lib64/nagios/plugins/check_file_age -w 86400 -c 172800 -W 13000000 -C 100000000 -f /backup/gerrit/daily/20200414_0010_mysqldb.sql.gz
...and the command succeeded via check_nrpe when the nrpe daemon is started via systemctl.
(Actually, ssax, I think this last bit shows that the file permissions must be okay - it works via check_nrpe so long as I remove the wildcards)
Re: check_nrpe+systemd wildcard issue
I tried this, adding the "-" to the command. No change in behavior.ssax wrote:Did you do a su - nagios before starting it from the command line? I believe you would need to do that in order to make it an apples-to-apples comparison.
555 /ssax wrote:What permissions do the directory and the file have?
755 backup
755 gerrit
755 daily
644 <files within daily directory>
So, read and execute on all directories for "other", and read for the files within. Should be good - backed up by the fact that hardcoding the filename resulted in successful command execution via check_nrpe.
Man, I am stumped!
Re: check_nrpe+systemd wildcard issue
I think the shell may be interpreting the ????.
As a test, try hardcoding it to this:
Does that work?
I'll lab it up on mine as well.
Edit: Yep, worked on mine with systemd unit file/systemctl.
As a test, try hardcoding it to this:
Code: Select all
/usr/lib64/nagios/plugins/check_file_age -w 86400 -c 172800 -W 13000000 -C 100000000 -f '/backup/gerrit/daily/20200414_????_mysqldb.sql.gz'I'll lab it up on mine as well.
Edit: Yep, worked on mine with systemd unit file/systemctl.
Re: check_nrpe+systemd wildcard issue
Unfortunately, no, it did not work for me when I tried it yesterday:ssax wrote:I think the shell may be interpreting the ????.
As a test, try hardcoding it to this:
Does that work?Code: Select all
/usr/lib64/nagios/plugins/check_file_age -w 86400 -c 172800 -W 13000000 -C 100000000 -f '/backup/gerrit/daily/20200414_????_mysqldb.sql.gz'
I'll lab it up on mine as well.
Edit: Yep, worked on mine with systemd unit file/systemctl.
OTR wrote:To eliminate possible issues, I updated the command so that the date was hardcoded:
/usr/lib64/nagios/plugins/check_file_age -w 86400 -c 172800 -W 13000000 -C 100000000 -f /backup/gerrit/daily/20200414_????_mysqldb.sql.gz
...failure via check_nrpe when nrpe daemon is started via systemctl still occurred.
Re: check_nrpe+systemd wildcard issue
This is what I had to set the command as to get it to work:
Additionally, I don't see any of your commands with the -f part single quoted, please test again.
Code: Select all
/usr/lib64/nagios/plugins/check_file_age -w 86400 -c 172800 -W 13000000 -C 100000000 -f "/backup/gerrit/daily/`date '+%Y%m%d'`_"'????_mysqldb.sql.gz'Re: check_nrpe+systemd wildcard issue
Oh, I totally missed those quotes. I'll report back shortly.ssax wrote:This is what I had to set the command as to get it to work:
Additionally, I don't see any of your commands with the -f part single quoted, please test again.Code: Select all
/usr/lib64/nagios/plugins/check_file_age -w 86400 -c 172800 -W 13000000 -C 100000000 -f "/backup/gerrit/daily/`date '+%Y%m%d'`_"'????_mysqldb.sql.gz'
Edit: tried both of the following. Neither worked via check_nrpe or the CLI on the host. Which is consistent, at least, just in the wrong direction
Code: Select all
/usr/lib64/nagios/plugins/check_file_age -w 86400 -c 172800 -W 13000000 -C 100000000 -f '/backup/gerrit/daily/20200414_????_mysqldb.sql.gz'Code: Select all
/usr/lib64/nagios/plugins/check_file_age -w 86400 -c 172800 -W 13000000 -C 100000000 -f "/backup/gerrit/daily/`date '+%Y%m%d'`_"'????_mysqldb.sql.gz'Re: check_nrpe+systemd wildcard issue
A colleague of mine thinks this may relate to SELinux. I'll be investigating that today, if that's the issue I'll be sure to come back and report.
For the heck of it, I'll share the other, different but similarly perplexing, issue.
The other issue is on a different host. The command being used is "check_disk -L". Run via the command line, all is well. Run via check_nrpe from the server, I get a response along the lines of "DISK CRITICAL - /proc/fs/nfsd is not accessible: permission denied"
I checked permissions from / on down, everything looked good. I also tried temporarily opening up permissions to 777 for the entire path down - issue persisted.
Much like the other issue, the command runs fine from the host's CLI. Also like the other issue, everything operates fine via check_nrpe when I stop the nrpe service and manually launch the daemon from the CLI, whether the user be me, sudo-me, root, or nagios.
For the heck of it, I'll share the other, different but similarly perplexing, issue.
The other issue is on a different host. The command being used is "check_disk -L". Run via the command line, all is well. Run via check_nrpe from the server, I get a response along the lines of "DISK CRITICAL - /proc/fs/nfsd is not accessible: permission denied"
I checked permissions from / on down, everything looked good. I also tried temporarily opening up permissions to 777 for the entire path down - issue persisted.
Much like the other issue, the command runs fine from the host's CLI. Also like the other issue, everything operates fine via check_nrpe when I stop the nrpe service and manually launch the daemon from the CLI, whether the user be me, sudo-me, root, or nagios.