Unable to execute a application specific cmd using check_nrp

This forum is intended for the discussion of Nagios Core development. Feature requests, patches, bug fixes, and all types of development-related discussions are welcome!

NOTE: The SourceForge.net nagios-devel mailing list has been deprecated in favor of this forum in order to expedite support and provide additional features not available on the old mailing list.

Unable to execute a application specific cmd using check_nrp

Postby diwakar0304 » Fri Apr 21, 2017 7:37 am

Hi,

There is an issue while a customized check_script was made ready to run a particular application command and process the output.

Below are the details.

Logapp root:/usr/local/nagios/libexec 2$ /usr/local/nagios/libexec/check_nrpe -V

Code: Select all
NRPE Plugin for Nagios
Copyright (c) 1999-2008 Ethan Galstad (nagios@nagios.org)
Version: 3.0.1
Last Modified: 09-08-2016
License: GPL v2 with exemptions (-l for more info)


======================================================================================

check_watchque:

Code: Select all
#!/bin/bash

/<Path to application>/watchqueue 1 ##(--> this command is not executed while using check_nrpe )
     if [ `/bin/cat /tmp/watchqueue_1.txt | /bin/grep shm | /usr/bin/head -1 | /bin/awk '{print $7}' | /bin/sed 's/(//g;s/)//g'` -ge `/bin/date +%s` ]
        then
            MsgRate=`/bin/cat /tmp/watchqueue_1.txt  | /bin/grep 'Total UDP Messages' | /bin/awk '{print $9}' | /bin/sed 's/(//g'`
            /bin/echo "Ok: Message rate is $MsgRate | $MsgRate"
            exit 0
        else
            /bin/echo "Critical: File /tmp/watchqueue_1.txt is not generated with current time"
            exit 2
     fi


====================================================================================================
nrpe.cfg configuration:
Code: Select all
log_facility=daemon
debug=0
pid_file=/usr/local/nagios/var/nrpe.pid
server_port=5666
nrpe_user=nagios
nrpe_group=nagios
allowed_hosts=127.0.0.1,X.X.X.X
dont_blame_nrpe=1
allow_bash_command_substitution=1
command_timeout=60
connection_timeout=300
command[check_users]=/usr/local/nagios/libexec/check_users -w 5 -c 10
command[check_load]=/usr/local/nagios/libexec/check_load -w 15,10,5 -c 30,25,20
command[check_hda1]=/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /dev/hda1
command[check_zombie_procs]=/usr/local/nagios/libexec/check_procs -w 5 -c 10 -s Z
command[check_total_procs]=/usr/local/nagios/libexec/check_procs -w 150 -c 200
command[check_disk]=/usr/local/nagios/libexec/check_disk -w 20% -c 10%
command[check_watchque]=/usr/local/nagios/libexec/check_watchque


=======================================================================================

Command execution output:

From nagios user:

(Below is appropriate output)
Code: Select all
Logapp nagios:~ 0$ /usr/local/nagios/libexec/check_watchque
Ok: Message rate is 0 | 0


(Below is the issue, its only reading else statement from check script)
Code: Select all
Logapp nagios:~ 0$ /usr/local/nagios/libexec/check_nrpe -H localhost -c check_watchque
Critical: File /tmp/watchqueue_1.txt is not generated with current time


(Same output while running from root user)
Code: Select all
Logapp root:/usr/local/nagios/libexec 130$ /usr/local/nagios/libexec/check_nrpe -H localhost -c check_watchque
Critical: File /tmp/watchqueue_1.txt is not generated with current time


P.S.: All other check commands running absolutely fine. Also attaching strace output (IP and Hostname hided)

Thanks for investing your time for viewing issue.

BR//
Attachments
strace.txt
strace output
(7.68 KiB) Downloaded 8 times
Last edited by dwhitfield on Mon Apr 24, 2017 4:18 pm, edited 1 time in total.
Reason: code blocks FTW
diwakar0304
 
Posts: 11
Joined: Tue Nov 04, 2014 4:19 am

Re: Unable to execute a application specific cmd using check

Postby mcapra » Fri Apr 21, 2017 1:46 pm

Can we see an strace for this command run as the nagios user also:
Code: Select all
/usr/local/nagios/libexec/check_watchque


It might be helpful to compare that with the strace of check_nrpe.

Can you tell us roughly what this line of the script is doing:
Code: Select all
/<Path to application>/watchqueue 1


I think the script above (watchqueue) is where the problem lies. Something about how NRPE executes it might not be favorable. Using your exact bash script's logic with some minor modifications, I am unable to reproduce this:

Code: Select all
[root@localhost etc]# /usr/local/nagios/libexec/check_nrpe -H 127.0.0.1 -c check_watchqueue
Ok: Message rate is 0 | 0
Be sure to check out our Knowledgebase for helpful articles and solutions!

https://github.com/mcapra/
User avatar
mcapra
Support Tech
 
Posts: 2271
Joined: Thu May 05, 2016 3:54 pm
Location: Nagios Enterprises

Re: Unable to execute a application specific cmd using check

Postby jfrickson » Fri Apr 21, 2017 2:18 pm

Off-hand, I would say it's a permission problem. Make sure the output is going into a folder that the nagios user has write access to.
Nagios Enterprises, LLC
Email: jfrickson@nagios.com
Web: www.nagios.com
User avatar
jfrickson
Developer
 
Posts: 181
Joined: Wed Jul 22, 2015 9:58 am
Location: Nagios Enterprises

Re: Unable to execute a application specific cmd using check

Postby diwakar0304 » Sat Apr 22, 2017 12:53 am

Hi and thanks for response..

As stated by jfrickson, Nagios user do have all permission to write where ever it is necessary. You can also find one of the command output where I have executed check script from nagios user without using check_nrpe.

Anyhow, as per mcapra suspecting on right side of the issue. watchqueue command is to collect online application stats. Its executed in its own shell continuously. With appending option of page number i.e. 1 in our case it will generate a sanp of page in txt format as /tmp/watchqueue_1.txt. Hence <Path to commad>/watchqueue is command and 1 is the first page number to be printed.
Attaching sample output for
1) strace /usr/local/nagios/libexec/check_watchque (filename: strace without nrpe.txt)
strace without nrpe.txt
(11.59 KiB) Downloaded 6 times

2) watchqueue command output (Filename: watchqueue command output.gif)
watchqueue command output.GIF

3) cat /tmp/watchqueue_1.txt (Filename: watchqueue_1)
watchqueue_1.txt
(2.63 KiB) Downloaded 8 times


One more point I want to bring in your knowledge, while I was trying with other then check_nrpe option. I refereed check_by_ssh option.
while executing the command, it was giving error like
Code: Select all
Logapp nagios:~ 0$ /usr/local/nagios/libexec/check_by_ssh -v -H localhost -C "/usr/local/nagios/libexec/check_watchque"
Command: /usr/bin/ssh
Argument 1: localhost
Argument 2: /usr/local/nagios/libexec/check_watchque
nagios@localhost's password:
Remote command execution failed: --------------------------------------------------------------------------------
Logapp nagios:~ 2$


While running command with ssh only

Code: Select all
[nagios@nagiosserver ~]$ ssh -q nagios@X.X.X.X "/usr/local/nagios/libexec/check_watchque"
Error opening terminal: unknown.
Critical: File /tmp/watchqueue_1.txt is not generated with current time



Then I applied a patch on check_by_ssh by changing check_by_ssh.c and recompiled the plugins and got an option to forcing tty enable.
source for check_by_ssh.c (https://github.com/riskersen/monitoring-plugins/blob/2a24bc8e922d2780a842efbcef6fb6ead283fa44/plugins/check_by_ssh.c)

Code: Select all
[nagios@nagiosserver ~]$ /usr/local/nagios/libexec/check_by_ssh -z -E -t 20 -H 10.70.116.200 -C "/usr/local/nagios/libexec/check_watchque"
Ok: Message rate is 4 | 4


voila!!! I start getting output. But there is an issue, this is reflecting on nagios webui with same error message mentioned in starting of thread but not the correct output.
Code: Select all
Critical: File /tmp/watchqueue_1.txt is not generated with current time

cannot attach screen shot nagios page (As limit is 3 only)

my configuration to support on by check_by_ssh option.

command.cfg

# 'check_by_ssh' command definition
Code: Select all
define command{
        command_name    check_by_ssh_watchque
        command_line    $USER1$/check_by_ssh -z -E -t 20 -H $HOSTADDRESS$ -C "/usr/local/nagios/libexec/check_watchque"


service for host cfg file

Code: Select all
define service{
        use                             local-service,srv-pnp         ; Name of service template to use
        host_name                       <hostname of remote client>
        service_description             Message Rate
        check_command                   check_by_ssh_watchque
        }


Is anything wrong with check_by_ssh configuration.

BR//
diwakar0304
 
Posts: 11
Joined: Tue Nov 04, 2014 4:19 am

Re: Unable to execute a application specific cmd using check

Postby tgriep » Mon Apr 24, 2017 4:58 pm

In your first post, the posting of your /usr/local/nagios/libexec/check_watchque plugin looks like part of it was missed, can you post it again?

How it the file /tmp/watchqueue_1.txt created on the system?

From what I can see, the check compares the time stamp in the file against the current time and if the time newer in the file, it is OK, else it is Critical, is that what you are looking for?
Be sure to check out our Knowledgebase for helpful articles and solutions!
User avatar
tgriep
Madmin
 
Posts: 4548
Joined: Thu Oct 30, 2014 9:02 am

Re: Unable to execute a application specific cmd using check

Postby diwakar0304 » Mon Apr 24, 2017 8:28 pm

That code is complete. Only part is hidden as path of command 'watchqueue'. Which is not really necessary to mention cause it is available in 'env'.

As I have already shared output of /tmp/watchqueue_1.txt and how it's created. (Sorry I wouldn't able to post right now as I am repling from my mobile). But I will try to explain code line by line what I am trying to do here.

Code: Select all
#############################

#!/bin/bash

/<path to command>/watchqueue 1
##In order to get /tmp/watchqueue_1.txt, above command need to be executed. This execute commands on shm shell to collect application stats)

     if [ `/bin/cat /tmp/watchqueue_1.txt | /bin/grep shm | /usr/bin/head -1 | /bin/awk '{print $7}' | /bin/sed 's/(//g;s/)//g'` -ge `/bin/date +%s` ]

## as you stated, It will compare  time stamp of system and in /tmp/watchqueue_1.txt file. If epoc time under file is latest than system epoc then print ok with some counter from same file or script will declare else statement.

        then
            MsgRate=`/bin/cat /tmp/watchqueue_1.txt  | /bin/grep 'Total UDP Messages' | /bin/awk '{print $9}' | /bin/sed 's/(//g'`
            /bin/echo "Ok: Message rate is $MsgRate | $MsgRate"
            exit 0
        else
            /bin/echo "Critical: File /tmp/watchqueue_1.txt is not generated with current time"
            exit 2
     fi

######################


Whole issue lies at to execute command '/<path to command>/watchqueue 1'. Below are the list of attempts, errors or output.

1) executing only check script from nagios user is success.
2) executing from check_npe is failure.
3) By using check_by_ssh to execute check script from nagios server as nagios user' cli is success.
4) after integration in command and services cfg files, check is failure.

In above 4 points, failure means it is only printing the else statement echo message. That means not able to execute '/<path to command>/watchqueue 1'.

I want to add few more observation.
1) while check_by_ssh is executed by nagios app, at client end showing tty=unknown.
2) attempted using sudo by disabling Default parameter for requiretty, still throwing same exception with error.
3) this check_by_ssh is having option and already in use for forcing tty.

If applying sudo, it's asking terminal to execute script or printing else statement. Chech_by_ssh do have feature to execute with forcing tty then why it's stating tty unknown.

Pardon me for typos as sent from mobile.
diwakar0304
 
Posts: 11
Joined: Tue Nov 04, 2014 4:19 am

Re: Unable to execute a application specific cmd using check

Postby tgriep » Tue Apr 25, 2017 8:48 am

The NRPE Agent runs the check as the nagios user so it doesn't seen to be a permission issue.
Like you say, I think it is a TTY issue and there is a settings in the /etc/sudoers file you can change to not have the nagios user require a tty.

Add this line to the /etc/sudoers file to make sure the nagios account doesn't require a tty.
Code: Select all
Defaults:nagios !requiretty

Save it out and see if this makes the check run under the NRPE agent.
Be sure to check out our Knowledgebase for helpful articles and solutions!
User avatar
tgriep
Madmin
 
Posts: 4548
Joined: Thu Oct 30, 2014 9:02 am


Return to Nagios Core Development

Who is online

Users browsing this forum: No registered users and 5 guests