Unable to execute a application specific cmd using check_nrp

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
diwakar0304
Posts: 28
Joined: Tue Nov 04, 2014 4:19 am

Unable to execute a application specific cmd using check_nrp

Post by diwakar0304 »

Hi,

There is an issue while a customized check_script was made ready to run a particular application command and process the output.

Below are the details.

Logapp root:/usr/local/nagios/libexec 2$ /usr/local/nagios/libexec/check_nrpe -V

Code: Select all

NRPE Plugin for Nagios
Copyright (c) 1999-2008 Ethan Galstad (nagios@nagios.org)
Version: 3.0.1
Last Modified: 09-08-2016
License: GPL v2 with exemptions (-l for more info)
======================================================================================

check_watchque:

Code: Select all

#!/bin/bash

/<Path to application>/watchqueue 1 ##(--> this command is not executed while using check_nrpe )
     if [ `/bin/cat /tmp/watchqueue_1.txt | /bin/grep shm | /usr/bin/head -1 | /bin/awk '{print $7}' | /bin/sed 's/(//g;s/)//g'` -ge `/bin/date +%s` ]
        then
            MsgRate=`/bin/cat /tmp/watchqueue_1.txt  | /bin/grep 'Total UDP Messages' | /bin/awk '{print $9}' | /bin/sed 's/(//g'`
            /bin/echo "Ok: Message rate is $MsgRate | $MsgRate"
            exit 0
        else
            /bin/echo "Critical: File /tmp/watchqueue_1.txt is not generated with current time"
            exit 2
     fi
====================================================================================================
nrpe.cfg configuration:

Code: Select all

log_facility=daemon
debug=0
pid_file=/usr/local/nagios/var/nrpe.pid
server_port=5666
nrpe_user=nagios
nrpe_group=nagios
allowed_hosts=127.0.0.1,X.X.X.X
dont_blame_nrpe=1
allow_bash_command_substitution=1
command_timeout=60
connection_timeout=300
command[check_users]=/usr/local/nagios/libexec/check_users -w 5 -c 10
command[check_load]=/usr/local/nagios/libexec/check_load -w 15,10,5 -c 30,25,20
command[check_hda1]=/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /dev/hda1
command[check_zombie_procs]=/usr/local/nagios/libexec/check_procs -w 5 -c 10 -s Z
command[check_total_procs]=/usr/local/nagios/libexec/check_procs -w 150 -c 200
command[check_disk]=/usr/local/nagios/libexec/check_disk -w 20% -c 10%
command[check_watchque]=/usr/local/nagios/libexec/check_watchque
=======================================================================================

Command execution output:

From nagios user:

(Below is appropriate output)

Code: Select all

Logapp nagios:~ 0$ /usr/local/nagios/libexec/check_watchque
Ok: Message rate is 0 | 0
(Below is the issue, its only reading else statement from check script)

Code: Select all

Logapp nagios:~ 0$ /usr/local/nagios/libexec/check_nrpe -H localhost -c check_watchque
Critical: File /tmp/watchqueue_1.txt is not generated with current time
(Same output while running from root user)

Code: Select all

Logapp root:/usr/local/nagios/libexec 130$ /usr/local/nagios/libexec/check_nrpe -H localhost -c check_watchque
Critical: File /tmp/watchqueue_1.txt is not generated with current time
P.S.: All other check commands running absolutely fine. Also attaching strace output (IP and Hostname hided)

Thanks for investing your time for viewing issue.

BR//
Attachments
strace.txt
strace output
(7.68 KiB) Downloaded 386 times
Last edited by dwhitfield on Mon Apr 24, 2017 4:18 pm, edited 1 time in total.
Reason: code blocks FTW
User avatar
mcapra
Posts: 3739
Joined: Thu May 05, 2016 3:54 pm

Re: Unable to execute a application specific cmd using check

Post by mcapra »

Can we see an strace for this command run as the nagios user also:

Code: Select all

/usr/local/nagios/libexec/check_watchque
It might be helpful to compare that with the strace of check_nrpe.

Can you tell us roughly what this line of the script is doing:

Code: Select all

/<Path to application>/watchqueue 1
I think the script above (watchqueue) is where the problem lies. Something about how NRPE executes it might not be favorable. Using your exact bash script's logic with some minor modifications, I am unable to reproduce this:

Code: Select all

[root@localhost etc]# /usr/local/nagios/libexec/check_nrpe -H 127.0.0.1 -c check_watchqueue
Ok: Message rate is 0 | 0
Former Nagios employee
https://www.mcapra.com/
jfrickson

Re: Unable to execute a application specific cmd using check

Post by jfrickson »

Off-hand, I would say it's a permission problem. Make sure the output is going into a folder that the nagios user has write access to.
diwakar0304
Posts: 28
Joined: Tue Nov 04, 2014 4:19 am

Re: Unable to execute a application specific cmd using check

Post by diwakar0304 »

Hi and thanks for response..

As stated by jfrickson, Nagios user do have all permission to write where ever it is necessary. You can also find one of the command output where I have executed check script from nagios user without using check_nrpe.

Anyhow, as per mcapra suspecting on right side of the issue. watchqueue command is to collect online application stats. Its executed in its own shell continuously. With appending option of page number i.e. 1 in our case it will generate a sanp of page in txt format as /tmp/watchqueue_1.txt. Hence <Path to commad>/watchqueue is command and 1 is the first page number to be printed.
Attaching sample output for
1) strace /usr/local/nagios/libexec/check_watchque (filename: strace without nrpe.txt)
strace without nrpe.txt
(11.59 KiB) Downloaded 401 times
2) watchqueue command output (Filename: watchqueue command output.gif)
watchqueue command output.GIF
3) cat /tmp/watchqueue_1.txt (Filename: watchqueue_1)
watchqueue_1.txt
(2.63 KiB) Downloaded 377 times
One more point I want to bring in your knowledge, while I was trying with other then check_nrpe option. I refereed check_by_ssh option.
while executing the command, it was giving error like

Code: Select all

Logapp nagios:~ 0$ /usr/local/nagios/libexec/check_by_ssh -v -H localhost -C "/usr/local/nagios/libexec/check_watchque"
Command: /usr/bin/ssh
Argument 1: localhost
Argument 2: /usr/local/nagios/libexec/check_watchque
nagios@localhost's password:
Remote command execution failed: --------------------------------------------------------------------------------
Logapp nagios:~ 2$
While running command with ssh only

Code: Select all

[nagios@nagiosserver ~]$ ssh -q nagios@X.X.X.X "/usr/local/nagios/libexec/check_watchque"
Error opening terminal: unknown.
Critical: File /tmp/watchqueue_1.txt is not generated with current time

Then I applied a patch on check_by_ssh by changing check_by_ssh.c and recompiled the plugins and got an option to forcing tty enable.
source for check_by_ssh.c (https://github.com/riskersen/monitoring ... k_by_ssh.c)

Code: Select all

[nagios@nagiosserver ~]$ /usr/local/nagios/libexec/check_by_ssh -z -E -t 20 -H 10.70.116.200 -C "/usr/local/nagios/libexec/check_watchque"
Ok: Message rate is 4 | 4
voila!!! I start getting output. But there is an issue, this is reflecting on nagios webui with same error message mentioned in starting of thread but not the correct output.

Code: Select all

Critical: File /tmp/watchqueue_1.txt is not generated with current time
cannot attach screen shot nagios page (As limit is 3 only)

my configuration to support on by check_by_ssh option.

command.cfg

# 'check_by_ssh' command definition

Code: Select all

define command{
        command_name    check_by_ssh_watchque
        command_line    $USER1$/check_by_ssh -z -E -t 20 -H $HOSTADDRESS$ -C "/usr/local/nagios/libexec/check_watchque"
service for host cfg file

Code: Select all

define service{
        use                             local-service,srv-pnp         ; Name of service template to use
        host_name                       <hostname of remote client>
        service_description             Message Rate
        check_command                   check_by_ssh_watchque
        }
Is anything wrong with check_by_ssh configuration.

BR//
User avatar
tgriep
Madmin
Posts: 9177
Joined: Thu Oct 30, 2014 9:02 am

Re: Unable to execute a application specific cmd using check

Post by tgriep »

In your first post, the posting of your /usr/local/nagios/libexec/check_watchque plugin looks like part of it was missed, can you post it again?

How it the file /tmp/watchqueue_1.txt created on the system?

From what I can see, the check compares the time stamp in the file against the current time and if the time newer in the file, it is OK, else it is Critical, is that what you are looking for?
Be sure to check out our Knowledgebase for helpful articles and solutions!
diwakar0304
Posts: 28
Joined: Tue Nov 04, 2014 4:19 am

Re: Unable to execute a application specific cmd using check

Post by diwakar0304 »

That code is complete. Only part is hidden as path of command 'watchqueue'. Which is not really necessary to mention cause it is available in 'env'.

As I have already shared output of /tmp/watchqueue_1.txt and how it's created. (Sorry I wouldn't able to post right now as I am repling from my mobile). But I will try to explain code line by line what I am trying to do here.

Code: Select all

#############################

#!/bin/bash

/<path to command>/watchqueue 1 
##In order to get /tmp/watchqueue_1.txt, above command need to be executed. This execute commands on shm shell to collect application stats)

     if [ `/bin/cat /tmp/watchqueue_1.txt | /bin/grep shm | /usr/bin/head -1 | /bin/awk '{print $7}' | /bin/sed 's/(//g;s/)//g'` -ge `/bin/date +%s` ]

## as you stated, It will compare  time stamp of system and in /tmp/watchqueue_1.txt file. If epoc time under file is latest than system epoc then print ok with some counter from same file or script will declare else statement.

        then
            MsgRate=`/bin/cat /tmp/watchqueue_1.txt  | /bin/grep 'Total UDP Messages' | /bin/awk '{print $9}' | /bin/sed 's/(//g'`
            /bin/echo "Ok: Message rate is $MsgRate | $MsgRate"
            exit 0
        else
            /bin/echo "Critical: File /tmp/watchqueue_1.txt is not generated with current time"
            exit 2
     fi

######################
Whole issue lies at to execute command '/<path to command>/watchqueue 1'. Below are the list of attempts, errors or output.

1) executing only check script from nagios user is success.
2) executing from check_npe is failure.
3) By using check_by_ssh to execute check script from nagios server as nagios user' cli is success.
4) after integration in command and services cfg files, check is failure.

In above 4 points, failure means it is only printing the else statement echo message. That means not able to execute '/<path to command>/watchqueue 1'.

I want to add few more observation.
1) while check_by_ssh is executed by nagios app, at client end showing tty=unknown.
2) attempted using sudo by disabling Default parameter for requiretty, still throwing same exception with error.
3) this check_by_ssh is having option and already in use for forcing tty.

If applying sudo, it's asking terminal to execute script or printing else statement. Chech_by_ssh do have feature to execute with forcing tty then why it's stating tty unknown.

Pardon me for typos as sent from mobile.
User avatar
tgriep
Madmin
Posts: 9177
Joined: Thu Oct 30, 2014 9:02 am

Re: Unable to execute a application specific cmd using check

Post by tgriep »

The NRPE Agent runs the check as the nagios user so it doesn't seen to be a permission issue.
Like you say, I think it is a TTY issue and there is a settings in the /etc/sudoers file you can change to not have the nagios user require a tty.

Add this line to the /etc/sudoers file to make sure the nagios account doesn't require a tty.

Code: Select all

Defaults:nagios !requiretty
Save it out and see if this makes the check run under the NRPE agent.
Be sure to check out our Knowledgebase for helpful articles and solutions!
diwakar0304
Posts: 28
Joined: Tue Nov 04, 2014 4:19 am

Re: Unable to execute a application specific cmd using check

Post by diwakar0304 »

!!! NO Success !!!!

below was the configuration on remote server while testing

OS details: RHEL 6.5


visudo

Code: Select all

#Defaults    requiretty  ---> As per suggestion I disabled this parameter
#Defaults   !visiblepw ---> As per suggestion I disabled this parameter
Defaults    always_set_home
Defaults    env_reset
Defaults    env_keep =  "COLORS DISPLAY HOSTNAME HISTSIZE INPUTRC KDEDIR LS_COLORS"
Defaults    env_keep += "MAIL PS1 PS2 QTDIR USERNAME LANG LC_ADDRESS LC_CTYPE"
Defaults    env_keep += "LC_COLLATE LC_IDENTIFICATION LC_MEASUREMENT LC_MESSAGES"
Defaults    env_keep += "LC_MONETARY LC_NAME LC_NUMERIC LC_PAPER LC_TELEPHONE"
Defaults    env_keep += "LC_TIME LC_ALL LANGUAGE LINGUAS _XKB_CHARSET XAUTHORITY"
Defaults    secure_path = /sbin:/bin:/usr/sbin:/usr/bin:
root    ALL=(ALL)       ALL
nagios  ALL=(ALL)       NOPASSWD:ALL
Stop and started nrpe daemon

Check Script:

Code: Select all

#!/bin/bash

EPOC1=`/bin/date +%s`
sudo /<path to cmd>/watchqueue 1

EPOC2=`/bin/cat /tmp/watchqueue_1.txt | /bin/grep shm | /usr/bin/head -1 | /bin/awk '{print $7}' | /bin/sed 's/(//g;s/)//g'1`

        if [ "$EPOC2" -ge "$EPOC1" ]
          then
               MsgRate=`/bin/cat /tmp/watchqueue_1.txt  | /bin/grep 'Total UDP Messages' | /bin/awk '{print $9}' | /bin/sed 's/(//g'`
                   /bin/echo "Ok: Message rate is $MsgRate | $MsgRate"
                   exit 0
           else
               /bin/echo "Critical: File /tmp/watchqueue_1.txt is not generated with current time"
                   exit 2
        fi
Command entry at nrpe.cfg:

Code: Select all

command[check_watchque]=sudo /usr/local/nagios/libexec/check_watchque
output:

Code: Select all

Logapp nagios:~ 0$ /usr/local/nagios/libexec/check_nrpe -H localhost -c check_watchque
Critical: File /tmp/watchqueue_1.txt is not generated with current time
LOGS:
Check_nrpe:

Code: Select all

<31>Apr 26 09:07:38 localhost nrpe[28861]: Connection from 127.0.0.1 port 54925
<31>Apr 26 09:07:38 localhost nrpe[28861]: Host address is in allowed_hosts
<31>Apr 26 09:07:38 localhost nrpe[28861]: Host 127.0.0.1 is asking for command 'check_watchque' to be run...
<31>Apr 26 09:07:38 localhost nrpe[28861]: Running command: sudo /usr/local/nagios/libexec/check_watchque
<85>Apr 26 09:07:38 localhost sudo:   nagios : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/local/nagios/libexec/check_watchque
<85>Apr 26 09:07:38 localhost sudo:   nagios : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/<path to cmd>/watchqueue 1
<31>Apr 26 09:07:38 localhost nrpe[28861]: Command completed with return code 2 and output: Critical: File /tmp/watchqueue_1.txt is not generated with current time
<31>Apr 26 09:07:38 localhost nrpe[28861]: Return Code: 2, Output: Critical: File /tmp/watchqueue_1.txt is not generated with current time
<31>Apr 26 09:07:38 localhost nrpe[28861]: Connection from 127.0.0.1 closed.
<13>Apr 26 09:07:38 localhost check_nrpe: Remote 127.0.0.1 accepted a Version 3 Packet
Check_by_ssh (executed by nagios UI "Re-schedule the next check of this service"):

Code: Select all

<86>Apr 26 09:11:38 localhost sshd[31160]: Accepted publickey for nagios from X.X.X.X port 35743 ssh2
<86>Apr 26 09:11:38 localhost sshd[31160]: pam_unix(sshd:session): session opened for user nagios by (uid=0)
<85>Apr 26 09:11:38 localhost sudo:   nagios : TTY=unknown ; PWD=/usr/local/nagios ; USER=root ; COMMAND=/<Path to cmd>/watchqueue 1
<86>Apr 26 09:11:38 localhost sshd[31162]: Received disconnect from X.X.X.X: 11: disconnected by user
<86>Apr 26 09:11:38 localhost sshd[31160]: pam_unix(sshd:session): session closed for user nagios
after removing "sudo" from all command in check script and nrpe.cfg, but keeping sudoer config file as it is.
Stop and started nrpe daemon

Check_nrpe:

Code: Select all

<31>Apr 26 09:14:50 localhost nrpe[627]: Connection from 127.0.0.1 port 63117
<31>Apr 26 09:14:50 localhost nrpe[627]: Host address is in allowed_hosts
<31>Apr 26 09:14:50 localhost nrpe[627]: Host 127.0.0.1 is asking for command 'check_watchque' to be run...
<31>Apr 26 09:14:50 localhost nrpe[627]: Running command: /usr/local/nagios/libexec/check_watchque
<31>Apr 26 09:14:50 localhost nrpe[627]: Command completed with return code 2 and output: Critical: File /tmp/watchqueue_1.txt is not generated with current time
<31>Apr 26 09:14:50 localhost nrpe[627]: Return Code: 2, Output: Critical: File /tmp/watchqueue_1.txt is not generated with current time
<31>Apr 26 09:14:50 localhost nrpe[627]: Connection from 127.0.0.1 closed.
Check_by_ssh (executed by nagios UI "Re-schedule the next check of this service"):

Code: Select all

<86>Apr 26 09:15:31 localhost sshd[1025]: Accepted publickey for nagios from X.X.X.X port 35750 ssh2
<86>Apr 26 09:15:31 localhost sshd[1025]: pam_unix(sshd:session): session opened for user nagios by (uid=0)
<86>Apr 26 09:15:32 localhost sshd[1027]: Received disconnect from X.X.X.X: 11: disconnected by user
<86>Apr 26 09:15:32 localhost sshd[1025]: pam_unix(sshd:session): session closed for user nagios
Last edited by diwakar0304 on Tue Apr 25, 2017 11:25 pm, edited 2 times in total.
diwakar0304
Posts: 28
Joined: Tue Nov 04, 2014 4:19 am

Re: Unable to execute a application specific cmd using check

Post by diwakar0304 »

Still suggested sudo config is on, this is the difference while running from cli and nagio UI

CLI command from nagios server:

Code: Select all

[nagios@Nagiosserver ~]$ /usr/local/nagios/libexec/check_by_ssh -z -E  -H <remote server IP> -C /usr/local/nagios/libexec/check_watchque
Ok: Message rate is 13 | 13
logs from remote server:

Code: Select all

<86>Apr 26 09:45:00 localhost sshd[17940]: Accepted publickey for nagios from X.X.X.X port 35793 ssh2
<86>Apr 26 09:45:00 localhost sshd[17940]: pam_unix(sshd:session): session opened for user nagios by (uid=0)
<85>Apr 26 09:45:00 localhost sudo:   nagios : TTY=pts/2 ; PWD=/usr/local/nagios ; USER=root ; COMMAND=/<Path to CMD>/watchqueue 1
<86>Apr 26 09:45:00 localhost sshd[17943]: Received disconnect from X.X.X.X: 11: disconnected by user
<86>Apr 26 09:45:00 localhost sshd[17940]: pam_unix(sshd:session): session closed for user nagios
Check_by_ssh (executed by nagios UI "Re-schedule the next check of this service"):
logs from remote server:

Code: Select all

<86>Apr 26 09:45:29 localhost sshd[18230]: Accepted publickey for nagios from X.X.X.X port 35794 ssh2
<86>Apr 26 09:45:29 localhost sshd[18230]: pam_unix(sshd:session): session opened for user nagios by (uid=0)
<85>Apr 26 09:45:29 localhost sudo:   nagios : TTY=unknown ; PWD=/usr/local/nagios ; USER=root ; COMMAND=/<Path to CMD>/watchqueue 1
<86>Apr 26 09:45:29 localhost sshd[18235]: Received disconnect from X.X.X.X: 11: disconnected by user
<86>Apr 26 09:45:29 localhost  sshd[18230]: pam_unix(sshd:session): session closed for user nagios
Last edited by diwakar0304 on Wed May 03, 2017 12:55 pm, edited 1 time in total.
User avatar
tgriep
Madmin
Posts: 9177
Joined: Thu Oct 30, 2014 9:02 am

Re: Unable to execute a application specific cmd using check

Post by tgriep »

The only thing I can think of is that the watchqueue command that generates the /tmp/watchqueue_1.txt file is locking the file or it did not finish writing the timestamp and that is causing the error.

Can you verify that the script is getting the current time and the timestamp out of the file by adding the following lines to the /usr/local/nagios/libexec/check_watchque script?

Code: Select all

echo "$EPOC1" >>/tmp/time
echo "$EPOC2" >>/tmp/time
Then run the check and see what is in the /tmp/time file for those 2 variables.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Locked